Reinforcement Learning for Stochastic Networks, Toulouse

Name: Reinforcement Learning for Stochastic Networks, Toulouse
Start: 2024-06-17T09:00:00+02:00
End: 2024-06-21T18:00:00+02:00
Location: ENSEEIHT

Jun 17 – 21, 2024

ENSEEIHT

Europe/Paris timezone

Session

Parallel session: Policy gradient methods: optimization and convergence

Jun 17, 2024, 1:30 PM

A001

There are no materials yet.

42. Computing the bias of stochastic approximation with constant step-size via Stein's method.

Nicolas Gast (Inria, Univ. Grenoble Alpes)

6/17/24, 1:30 PM

Stochastic approximation algorithms are quite popular in reinforcement learning notably because they are powerful tools to study the convergence of algorithms based on stochastic gradient descent (like Q-learning of policy gradient). In this talk, I will focus on constant step-size stochastic approximation and present tools to compute its asymptotic bias, which is non-zero (both for Martingale...
Go to contribution page
27. Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes

Isaac Grosof (University of Illinois, Urbana-Champaign; Northwestern University)

6/17/24, 2:00 PM

Infinite-state Markov Decision Processes (MDPs) are essential in modeling and optimizing a wide variety of engineering problems. In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the heart of many popular policy-gradient based learning algorithms, such as natural actor-critic, TRPO, and PPO, lies the Natural Policy...
Go to contribution page
49. On the Global Convergence of Policy Based Methods in Average Reward Problems

Yashaswini Murthy (University of Illinois Urbana Champaign)

6/17/24, 2:30 PM

In the context of average reward Markov Decision Processes (MDPs), traditional approaches for obtaining performance bounds based on discounted reward formulations fail to provide meaningful bounds due to their dependence on the horizon. This limitation arises because average reward problems can be viewed as discounted reward problems, with the discount factor approaching 1, effectively...
Go to contribution page

Building timetable...

Reinforcement Learning for Stochastic Networks, Toulouse

Session

Parallel session: Policy gradient methods: optimization and convergence

A001

Description

Presentation materials

Choose timezone

Reinforcement Learning for Stochastic Networks, Toulouse

Description

Presentation materials