Description
Session chair: Konstantin Avrachenkov
We consider network utility maximization for job admission, routing, and scheduling in a queueing network with unknown job utilities as a type of multi-armed bandit problem. This "Backlogged Bandit" problem is a bandit learning problem with delayed feedback due to the end-to-end delay of a job waiting in the queue of each node in its path through the network. While recent work has explored...
We study the interpersonal trust of a population of agents, asking whether chance may decide if a population ends up in a high trust or low trust state. We model this by a discrete time, random matching stochastic coordination game. Agents are endowed with an exponential smoothing learning rule about the behaviour of their neighbours. We find that, with probability one in the long run the...
We consider multiagent Q-learning with each agent having her
own reward function, but all agents influencing the transition
mechanism. By relaxing the exact optimality to a requirement of
`satisficing', modelled as driving the average costs to prescribed
acceptable regions, we propose a scheme that provably achieves this.