Jun 17 – 21, 2024
ENSEEIHT
Europe/Paris timezone

Session

Parallel session: Policy gradient methods: optimization and convergence

Jun 17, 2024, 1:30 PM
A001

A001

Description

Organizer and chair: Isaac Grosof

Presentation materials

There are no materials yet.

  1. Nicolas Gast (Inria, Univ. Grenoble Alpes)
    6/17/24, 1:30 PM

    Stochastic approximation algorithms are quite popular in reinforcement learning notably because they are powerful tools to study the convergence of algorithms based on stochastic gradient descent (like Q-learning of policy gradient). In this talk, I will focus on constant step-size stochastic approximation and present tools to compute its asymptotic bias, which is non-zero (both for Martingale...

    Go to contribution page
  2. Isaac Grosof (University of Illinois, Urbana-Champaign; Northwestern University)
    6/17/24, 2:00 PM

    Infinite-state Markov Decision Processes (MDPs) are essential in modeling and optimizing a wide variety of engineering problems. In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the heart of many popular policy-gradient based learning algorithms, such as natural actor-critic, TRPO, and PPO, lies the Natural Policy...

    Go to contribution page
  3. Yashaswini Murthy (University of Illinois Urbana Champaign)
    6/17/24, 2:30 PM

    In the context of average reward Markov Decision Processes (MDPs), traditional approaches for obtaining performance bounds based on discounted reward formulations fail to provide meaningful bounds due to their dependence on the horizon. This limitation arises because average reward problems can be viewed as discounted reward problems, with the discount factor approaching 1, effectively...

    Go to contribution page
Building timetable...