Probabilités et statistiques

Small total-cost constraints in contextual bandits with knapsacks.

par Evgenii Chzhen

Europe/Paris
Salle de Conférence (LJAD)

Salle de Conférence

LJAD

Description

I will talk about some recent developments in the literature of contextual bandit problems with knapsacks, a problem where at each round, a scalar reward is obtained and vector-valued costs are suffered.  The goal is to maximize the cumulative rewards while ensuring that the cumulative costs are lower than some predetermined cost constraints. 
In this setting, total cost constraints had so far to be at least of order T^{3/4} where T is the number of rounds, and were even typically assumed to depend linearly on T. Elaborating on the main technical challenge and drawback of the previous approaches, I will present a dual strategy based on projected-gradient-descent updates, that is able to deal with total-cost constraints of the order of T^{1/2} up to poly-logarithmic terms. This strategy is direct, and it relies on a careful, adaptive, tuning of the step size. The approach is inspired by a parameter-free-type algorithms arising from convex (online) optimization literature.