Identification of Blackwell Policies for Deterministic MDPs
Victor Boone  1@  , Bruno Gaujal  2@  
1 : Université Grenoble Alpes
Université Grenoble Alpes
2 : Inria
L'Institut National de Recherche en Informatique et e n Automatique (INRIA)

We consider the problem of the identification of Blackwell optimal policies for deterministic finite Markov Decision Processes (d-MDPs). Specifically, we are interested in algorithms that learn reward distributions by querying samples over time, that stop almost surely and return a Blackwell optimal policy with high probability. We provide a characterization of the class of MDPs over which such algorithms exist together with an algorithm identifying Blackwell optimal policies with arbitrarly high probability.

Personnes connectées : 4 Vie privée | Accessibilité