Speaker
Prof.
Vivek Borkar
(Indian Institute of Technology Bombay)
Description
We consider multiagent Q-learning with each agent having her
own reward function, but all agents influencing the transition
mechanism. By relaxing the exact optimality to a requirement of
`satisficing', modelled as driving the average costs to prescribed
acceptable regions, we propose a scheme that provably achieves this.
Primary authors
Mr
Keshav Patel Keval
(Indian Institute of Technology Bombay)
Prof.
Vivek Borkar
(Indian Institute of Technology Bombay)