The Restless Bandits Problem with Partially Observable Markov Decision Processes
Mimi Zhang, SCSS
12-1pm 5th Dec 2017
Abstract
The restless bandits (RBs) problem can be interpreted as a system of independent Markov decision processes, with an additional constraint on the available actions to the system. In many problem domains, however, a decision maker suffers from limited sensing capabilities that preclude him from recovering an exact Markovian state from perceptions. The current work generalizes the RBs framework to the circumstance in which the RBs' states are not perfectly known, and hence their dynamics are modelled by partially observable Markov chains. The generalized framework is called the partially observable RBs problem. Two different heuristic policies are proposed, based on two types of indices that are straightforward to calculate. The performance of the proposed heuristics is evaluated in a systematic computational study, showing an exceptional near-optimality.

