Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

Reinforcement learning in a prisoner's dilemma

Speaker

Artur Dolgopolov (Bielefeld University)

Description

I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.

Primary author

Artur Dolgopolov (Bielefeld University)

Presentation materials

There are no materials yet.