Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

Quantifying the likelihood of collusion by provably convergent reinforcement learning

Speaker

Janusz Meylahn (University of Twente)

Description

Recent advances in decentralized multiagent reinforcement learning (MARL) have led to the development of algorithms that are provably convergent in a variety of Markov game subclasses. One of these is the Decentralized Q-learning (DQ) algorithm by Arslan and Yüksel (2017) which is provably convergent in weakly acyclic games. In this talk, I will present a new characterization of weak acyclicity and use it to show that the prisoner's dilemma with a memory of one period is weakly acyclic. This new characterization naturally leads to an identification of the basins of attraction of all possible strategy equilibria of the DQ algorithm. Since only a subset of strategy equilibria leads to robust collusion, we can use this to quantify the likelihood of observing algorithmic collusion. In addition, I will discuss the effect that fluctuations in the learning process and the addition of a third intermediate action to the prisoner's dilemma have on the likelihood of collusion.

Primary author

Janusz Meylahn (University of Twente)

Presentation materials

There are no materials yet.