We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / Zierahn, Lukas; Van Der Hoeven, Dirk; Lancewicki, Tal; Rosenberg, Aviv; Cesa-Bianchi, Nicolo. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1532-4435. - 195:(2025), pp. 1-37. (Intervento presentato al convegno The Thirty Sixth Annual Conference on Learning Theory tenutosi a Bangalore (IND) nel 12-15 July 2023).

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

Lukas Zierahn;
2025

Abstract

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in four important settings. We derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov Decision Processes with delay (both known and unknown transition functions). Furthermore, we use our analysis to develop an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. In order to derive these results we show that FTRL remains stable across multiple rounds under mild assumptions on the regularizer.
File in questo prodotto:
File Dimensione Formato  
hoeven23a.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 473.54 kB
Formato Adobe PDF
473.54 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2998193