Андрей Колобов
Ph.D.
Principal researcher, reinforcement learning group, Microsoft Research -- Redmond.
Основные области интересов: планирование, обучение с подкреплением, имитационное обучение, а также методы обучения векторных представлений для повышения эффективности этих подходов. Применение методов автономного принятия решений в робототехнике и компьютерных системах (веб-краулинг, кэширование, распределение ресурсов).
Тема доклада: Representation Learning for Reinforcement Learning
Абстракт: Vanilla reinforcement learning (RL) regards solving sequential decision-making problems as a process of using bandit feedback to train a function that maps an RL agent's observations to actions. The past decade's experiences of combining RL with deep neural networks (DNNs) and applying deep RL methods to environments with high-dimensional observations, e.g., robotics and videogames, highlight a formidable issue with this simplified RL paradigm. DNNs capture a very expressive function class; training them solely with impoverished reward signals of a single RL task leads to overfitting, poor sample efficiency, and other difficulties. Representation learning methods for RL offer a promising solution to these challenges. They view observation-action mapping as a composition of two functions: a complex highly non-linear encoder that compresses observations to a representation -- a small set of features directly relevant to decision-making -- and a relatively simple policy that chooses actions based on the representation's features. While the policy itself may be learned via RL alone, training the representation can involve an intricate mixture or reinforcement, supervised, and self-supervised learning. This talk will provide a broad overview of RL-focused representation learning approaches. It will examine the role of representations in improving sample efficiency and generalization of RL agents in single-task and multitask settings, as well as briefly touch on meta-learning for RL. This exposition assumes only basic familiarity with RL concepts, as well as a very high-level idea of how deep RL algorithms such as PPO work.