23ème congrès annuel de la Société Française de Recherche Opérationnelle et d'Aide à la Décision

sciencesconf.org:roadef2022:378236

We consider the problem of the identification of Blackwell optimal policies for deterministic finite Markov Decision Processes (d-MDPs). Specifically, we are interested in algorithms that learn reward distributions by querying samples over time, that stop almost surely and return a Blackwell optimal policy with high probability. We provide a characterization of the class of MDPs over which such algorithms exist together with an algorithm identifying Blackwell optimal policies with arbitrarly high probability.

Type :	:	Article
Thématiques	:	Session 1 "Programmation Dynamique Stochastique" du GT COSMOS
Mots-Clés	:	Reinforcement Learning ; Markov Decision Processes ; Blackwell Optimality

Personnes connectées : 4

Vie privée | Accessibilité