Panloup, Fabien, Saadane, Sofiane and Gadat, Sébastien (2018) Regret bound for Narendra-Shapiro bandit algorithms. Stochastics. pp. 886-926.
This is the latest version of this item.
Preview |
Text
Download (1MB) | Preview |
Abstract
Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s which have been deeply studied in infinite horizon but for which scarce non-asymptotic results exist. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes.
Item Type: | Article |
---|---|
Language: | English |
Date: | May 2018 |
Refereed: | Yes |
Uncontrolled Keywords: | Regret, Stochastic Bandit Algorithms, Piecewise Deterministic Markov Processes |
Subjects: | B- ECONOMIE ET FINANCE |
Divisions: | TSE-R (Toulouse) |
Site: | UT1 |
Date Deposited: | 22 May 2018 11:32 |
Last Modified: | 02 Apr 2021 15:57 |
OAI Identifier: | oai:tse-fr.eu:32571 |
URI: | https://publications.ut-capitole.fr/id/eprint/25888 |
Available Versions of this Item
-
Regret bound for Narendra-Shapiro bandit algorithms. (deposited 16 Mar 2015 14:56)
- Regret bound for Narendra-Shapiro bandit algorithms. (deposited 22 May 2018 11:32) [Currently Displayed]