Welcome to Toulouse Capitole Publications - Toulouse Capitole Publications

Deposit

NEED HELP?

ADVANCED SEARCH +

BROWSE

Auteur

Type Publication

Unité Recherche

LINKS

Tools

Panloup, Fabien, Saadane, Sofiane and Gadat, Sébastien (2018) Regret bound for Narendra-Shapiro bandit algorithms. Stochastics. pp. 886-926.

This is the latest version of this item.

Preview

Text
Download (1MB) | Preview

Official URL : http://tse-fr.eu/pub/32571

Identification Number : 10.1080/17442508.2018.1457675

Abstract

Narendra-Shapiro (NS) algorithms are bandit-type algorithms developed in the 1960s which have been deeply studied in infinite horizon but for which scarce non-asymptotic results exist. In this paper, we focus on a non-asymptotic study of the regret and address the following question: are Narendra-Shapiro bandit algorithms competitive from this point of view? In our main result, we obtain some uniform explicit bounds for the regret of (over)-penalized-NS algorithms. We also extend to the multi-armed case some convergence properties of penalized-NS algorithms towards a stationary Piecewise Deterministic Markov Process (PDMP). Finally, we establish some new sharp mixing bounds for these processes.

Item Type:	Article
Language:	English
Date:	May 2018
Refereed:	Yes
Uncontrolled Keywords:	Regret, Stochastic Bandit Algorithms, Piecewise Deterministic Markov Processes
Subjects:	B- ECONOMIE ET FINANCE
Divisions:	TSE-R (Toulouse)
Site:	UT1
Date Deposited:	22 May 2018 11:32
Last Modified:	14 Jan 2026 14:22
OAI Identifier:	oai:tse-fr.eu:32571
URI:	https://publications.ut-capitole.fr/id/eprint/25888

Available Versions of this Item

Regret bound for Narendra-Shapiro bandit algorithms. (deposited 16 Mar 2015 14:56)
- Regret bound for Narendra-Shapiro bandit algorithms. (deposited 22 May 2018 11:32) [Currently Displayed]

Altmetric

Stats for this item

Downloads

Downloads per month over past year