Infinite horizon average cost dynamic programming subject to total variation distance ambiguity
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2019-01-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
30
2843-2872
2843-2872
Series
SIAM Journal on Control and Optimization, Volume 57, issue 4
Abstract
We analyze the per unit-time infinite horizon average cost Markov control model, subject to a total variation distance ambiguity on the controlled process conditional distribution. This stochastic optimal control problem is formulated as a minimax optimization problem in which the minimization is over the admissible set of control strategies, while the maximization is over the set of conditional distributions which are in a ball, with respect to the total variation distance, centered at a nominal distribution. We derive two new equivalent dynamic programming equations, and a new policy iteration algorithm. The main feature of the new dynamic programming equations is that the optimal control strategies are insensitive to inaccuracies or ambiguities in the controlled process conditional distribution. The main feature of the new policy iteration algorithm is that the policy evaluation and policy improvement steps are performed using the maximizing conditional distribution, which is obtained via a water filling solution of aggregating states together to form new states. Throughout the paper, we illustrate the new dynamic programming equations and the corresponding policy iteration algorithm to various examples.Description
Keywords
Average cost, Dynamic programming, Infinite horizon, Markov control models, Minimax, Policy iteration, Stochastic control, Total variation distance
Other note
Citation
Tzortzis, I, Charalambous, C D & Charalambous, T 2019, ' Infinite horizon average cost dynamic programming subject to total variation distance ambiguity ', SIAM Journal on Control and Optimization, vol. 57, no. 4, pp. 2843-2872 . https://doi.org/10.1137/18M1210514