Speaker
Description
We consider network utility maximization for job admission, routing, and scheduling in a queueing network with unknown job utilities as a type of multi-armed bandit problem. This "Backlogged Bandit" problem is a bandit learning problem with delayed feedback due to the end-to-end delay of a job waiting in the queue of each node in its path through the network. While recent work has explored techniques for learning under delayed feedback in this problem, such as the parallel instance technique, we find that the celebrated drift-plus-penalty technique classically used in the optimization of queueing networks already adequately controls the feedback delay in some problem instances. To that end, we focus our attention on developing theoretical techniques to study this style of algorithm in the Backlogged Bandits framework. In this talk, we present our recent work that explores the special case of routing in a bipartite (single-hop) queueing network, and discuss the challenges and our progress toward the general multi-hop case.