Home » Node » 28660

Operator World Models for Reinforcement Learning - Pietro Novelli

Speaker: 
Pietro Novelli
Data dell'evento: 
Thursday, 6 March, 2025 - 15:00
Luogo: 
Room A5 DIAG
Contatto: 
salzo@diag.uniroma1.it

Abstract 

Policy Mirror Descent is a powerful and theoretically sound methodology for sequential decision-making. However, it is not directly applicable to Reinforcement Learning due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. Leveraging tools from operator theory, we derive a closed-form expression of the action-value function in terms of the world model via simple matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method. 

Pietro Novelli is a physicist and a postdoc researcher at Istituto Italiano di Tecnologia, within the Computational Statistics & ML unit. He is currently working on machine learning for dynamical systems, reinforcement learning, machine learning for science, statistical learning theory & optimization. Pietro's work has been presented at NeurIPS 2024.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma