Description
Organizers and chairs: Lei Ying and Weina Wang
Presentation materials
We investigate an online learning and optimization problem in a queueing system having unknown arrival rates and service-time distribution. The service provider’s objective is to seek the optimal service fee
We consider the infinite-horizon, average reward restless bandit problem. For this problem, a central challenge is to find asymptotically optimal policies in a computationally efficient manner in the regime where the number of arms, N, grows large. Existing policies, including the renowned Whittle index policy, all rely on a uniform global attractor property (UGAP) assumption to achieve...