This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.
Dettaglio pubblicazione
2021, INTERNATIONAL JOURNAL OF CONTROL, Pages 1-12
Bellman's principle of optimality and deep reinforcement learning for time-varying tasks (01a Articolo in rivista)
Giuseppi A., Pietrabissa A.
Gruppo di ricerca: Networked Systems
keywords