Reinforcement Learning
Reinforcement learning is one of the most natural formulations of the problem of acquiring knowledge in the interaction of an agent with the environment, however, there are many problems associated with efficiency and application in robotics. Usually, modern planners are assigned sets of actions in advance at the symbolic level (in the form of special rules), but for a real robot that may find itself in a completely new situation, this approach does not work well. He must recall his past experience in similar situations and draw up a new plan from new actions that no one prompted him.
Иерархическое обучение с подкреплением
Александр Панов
In this direction, we develop both new methods and algorithms and solve new applied problems, including those with robotic manipulators. We also actively participate in competitions. For example, at the end of 2019. our team became the winner of the international NeurIPS MineRL competition, showing the best solution towards effective demonstration-based reinforcement learning methods.

Pages of completed projects in this direction:
Соревнование NeurIPS MineRL 2019
Алексей Скрынник рассказывает про решение, занявшее первое место
Публикации
  • Abhishek Kadian, Joanne Truong, Gokaslan, A., Clegg, A., Wijmans, E., Lee, S., Savva, M..: Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation, arXiv:1912.06321
  • Nair S., Finn C. Hierarchical foresight: self-supervised learning of long-horizon tasks via visual subgoal generation // ICLR 2020. 2020. Ссылка
  • Staroverov A., Panov A.I. Hierarchical Actor-Critic with Hindsight for Mobile Robot with Continuous State Space // Advances in Neural Computation, Machine Learning, and Cognitive Research III. Studies in Computational Intelligence / ed. Kryzhanovsky B. et al. Springer, 2020. Vol. 856. P. 62–70. Springer
  • Aksenov K., Panov A. Approximation Methods for Monte Carlo Tree Search // Proceedings of the Fourth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19). IITI’19 2019. Advances in Intelligent Systems and Computing / ed. Kovalev S. et al. Springer International Publishing, 2020. Vol. 1156. P. 68–74. Springer
  • Gorodetskiy A., Shlychkova A., Panov A.I. Delta Schema Network in Model-based Reinforcement Learning // Artificial General Intelligence. AGI 2020. Lecture Notes in Computer Science / ed. Goertzel B. et al. Springer, 2020. Vol. 12177. P. 172–182. Springer
  • Younes A., Panov A.I. Toward Faster Reinforcement Learning for Robotics : Using Gaussian Processes // RAAI Summer School 2019. Lecture Notes in Computer Science / ed. Osipov G.S., Panov A.I., Yakovlev K.S. Springer, 2019. Vol. 11866. P. 160–174. Springer
  • Skrynnik A. et al. Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft // NeurIPS 2019 Competition Track. 2019. P. 1–5. ArXiv
  • Kuzmin V., Panov A.I. Hierarchical Reinforcement Learning with Options and United Neural Network Approximation // Proceedings of the Third International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’18). IITI’18 2018. Advances in Intelligent Systems and Computing / ed. Abraham A. et al. Springer, 2019. Vol. 874. P. 453–462. Springer
  • Skrynnik A., Panov A.I. Hierarchical Reinforcement Learning with Clustering Abstract Machines // Artificial Intelligence. RCAI 2019. Communications in Computer and Information Science / ed. Kuznetsov S.O., Panov A.I. Springer, 2019. Vol. 1093. P. 30–43. Springer
Презентации
  • Delta Schema Network - AGI-2020. Доклад
  • MineRL Competition - 1st place. Слайды
  • Кластеризующие абстрактные автоматы - КИИ-2020. Слайды
  • Hindsight для навигации робота - Нейроинфморматика-2020. Слайды
Необходимые умения для стажеров
  • Владение Python
  • Технический английский
  • Владение современными фреймворками глубокого обучения
  • Навыки администрирования вычислительных кластеров
Темы научно-исследовательских проектов
  • Иерархическое обучение с подкреплением
  • Планирование поведения в средах с «откатываемыми» действиями
  • Интеграция методов планирования по прецедентам и обучения с подкреплением
  • Глубокое обучение с подкреплением для формирования сложных действий манипулятора робота
  • Обучение с подкреплением для навигации и построения карты местности
  • Обучение с подкреплением в многоагентных средах
  • Abhishek Kadian, Joanne Truong, Gokaslan, A., Clegg, A., Wijmans, E., Lee, S., Savva, M..: Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation,