Game Synthesis & Control

Teachers: Anca Muscholl and Guillaume Lagarde

This course is compulsory for VL and optional for AM. It is worth 6ECTS.

Part 1 (Anca Muscholl):

The first part of the course is an introduction to game theory for verification and synthesis. The synthesis of open systems or controllers is based on the principle of a reactive system, which must interact with its environment. The two entities - system and environment - are seen as 2 antagonistic players. Different types of games will be discussed: two-player games on finite arenas, games for controller synthesis, and distributed games.

References:

  1. Automata, logics and infinite games (LNCS 2500) chapter 2
  2. Tutorial on (distributed) games

Lecture notes

  1. Scribe notes Théo Lacoste 3/10/23

Exercises

  1. TD1
  2. TD2

Research articles

  1. Pawel Parys. Parity Games: Zielonka’s Algorithm in Quasi-Polynomial Time. MFCS 2019.

  2. Henrik Bjorklund, Sven Sandberg, Sergei Vorobyov. Memoryless determinacy of parity and mean payo& games: a simple proof. TCS 2003

  3. Nathalie Bertrand, Miheer Dewaskar, Blaise Genest, Hugo Gimbert, Adwait Amit Godbole. Controlling a population. Log. Methods Comput. Sci. 15(3) (2019)

Part 2 (Guillaume Lagarde):

In this second part, we will explore the theoretical foundations of reinforcement learning, an extremely powerful artificial intelligence framework that enables machines to acquire knowledge and make decisions by interacting with their environment. We will explore in detail fundamental concepts such as the one-armed bandit problem, Markov decision processes, trade-offs between exploration and exploitation, Monte-Carlo and Q-learning, etc., as well as advanced techniques such as function approximation and Deep Q-learning. By the end of this course, you'll have acquired the skills needed to understand AlphaGo, the first AI to outperform humans at the game of go.

References:

  1. Reinforcement Learning: An Introduction by Barto and Sutton
  2. Foundation of Machine Learning by Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar
  3. DeepMind Reinforcement Learning Lecture Series 2021

Exercices

  • TD1: Implementation of value iteration, monte carlo, sarsa, q_learning, learning with linear function approximation.
  • TD2: Implementation Monte Carlo Tree Search

Research articles

  1. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. Prioritized experience replay. arXiv preprint, 2015.

  2. Coquelin, P. A., & Munos, R. Bandit algorithms for tree search. arXiv preprint, 2007.

  3. Van Hasselt, H., Guez, A., & Silver, D. Deep reinforcement learning with double q-learning. Proceedings of the AAAI conference on artificial intelligence, 2016.

  4. Mnih, V., Kavukcuoglu, K., Silver, D., et al. Playing atari with deep reinforcement learning. arXiv preprint, 2013.

  5. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artificial intelligence, 1998.

  6. Pineau, J., Gordon, G., & Thrun, S. Point-based value iteration: An anytime algorithm for POMDPs. Ijcai, 2003.

  7. Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 1999.

  8. Hasselt, H. Double Q-learning. Advances in neural information processing systems, 2010.

  9. Hessel, M., Modayil, J., Van Hasselt, H., et al. Rainbow: Combining improvements in deep reinforcement learning. Proceedings of the AAAI conference on artificial intelligence, 2018.

These last two papers are a little bit more mathematical:

  1. John N. Tsitsiklis, Benjamin Van Roy. An analysis of temporal-difference learning with function approximation.

  2. Auer, P., Cesa-Bianchi, N., & Fischer, P. Finite-time analysis of the multiarmed bandit problem. Machine learning, 2002.