Orateur
Description
We consider a two-person zero-sum discounted stochastic game with random rewards and known transition probabilities. The players have opposite objectives and are interested in optimizing the expected discounted reward which they can obtain with a given confidence level when both the players play the worst possible move against each other.
We model such a game problem by defining the chance-constrained optimization problem of each player, and call this new formulation a chance-constrained stochastic game (CCSG). When the reward vector follows a multivariate elliptically symmetric distribution, the CCSG is equivalent to a minimax formulation. We consider CCSG with risk seeking and risk averse players separately.
We show that the risk-seeking problem is equivalent to a constrained optimization of a parameterized zero-sum stochastic game and the optimal payoff of player 1 and optimal cost of player 2 can be computed using algorithms based on a fixed point iteration. Later we can use the fixed point solutions to compute players' optimal strategies by solving linear programming problems.
We reformulate the risk averse problem as a discrete minimax problem. We propose an algorithm based on a linearization method and discuss its
convergence properties. Alternatively, we reformulate the risk averse problem as a second-order cone programming problem with bilinear constraints.
We illustrate the theoretical results with numerical experiments.