Abstract:
We extend the Markov Decision Process setup to the cases of MFG and MFC problems and we generalize the optimality Bellman equation for Q-learning. By introducing two learning rates, one for the Q-matrix and one for the population distribution, we are able to design a single algorithm which learns the optimal policies for the MFG or for the MFC depending on the ratio of these two rates. Applications to problems in finance are also discussed.
Joint work with Andrea Angiuli and Mathieu Laurière. |
Pstujeme web | visit: Skluzavky