Panloup, Fabien
, Saadane, Sofiane
and Gadat, Sébastien
(2018)
Regret bound for Narendra-Shapiro bandit algorithms.
Stochastics.
pp. 886-926.
This is the latest version of this item.
Preview |
Text
Download (1MB) | Preview |
Abstract
Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s which have been deeply studied in infinite horizon but for which scarce non-asymptotic results exist. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes.
| Item Type: | Article |
|---|---|
| Language: | English |
| Date: | May 2018 |
| Refereed: | Yes |
| Uncontrolled Keywords: | Regret, Stochastic Bandit Algorithms, Piecewise Deterministic Markov Processes |
| Subjects: | B- ECONOMIE ET FINANCE |
| Divisions: | TSE-R (Toulouse) |
| Site: | UT1 |
| Date Deposited: | 22 May 2018 11:32 |
| Last Modified: | 02 Apr 2021 15:57 |
| OAI Identifier: | oai:tse-fr.eu:32571 |
| URI: | https://publications.ut-capitole.fr/id/eprint/25888 |
Available Versions of this Item
-
Regret bound for Narendra-Shapiro bandit algorithms. (deposited 16 Mar 2015 14:56)
- Regret bound for Narendra-Shapiro bandit algorithms. (deposited 22 May 2018 11:32) [Currently Displayed]

Tools
Tools
