Speakers
Description
This talk introduces a novel online algorithm for tuning load balancers coupled with auto-scalers, considering bursty traffic arriving at finite queues with large buffer sizes and with large action space, and when the parameters of the queuing system are not known. When the policy of the queue is known, the problem can be modeled as a weakly coupled Markov Decision Process (MDP). LP-based policies can be used to solve weakly coupled Markov Decision Processes (MDP) and are known to be efficient heuristics. However, to compute such a heuristic, it is necessary to know the transition matrix of the underlying Markov chain. Moreover, a second challenge is that the number of variables and constraints may be too high to efficiently use the LP-based policy in practice. In this talk, we will present a new online algorithm to learn efficiently the LP-based policy. Our algorithm is based on a two-time scale algorithm, and MDP aggregation techniques. We can derive finite bounds between the errors made by our algorithms with the optimal policy.