This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a O ̃(d√T ) regret, where d is the dimension of the action vector and T is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy sub-routine. For general strongly convex action sets, we derive a O ̃(d T ) α-regret; for general compact sets lacking strong convexity, we establish a O ̃(dT2/3) α-regret, where α pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments including an application to recommendation system.

Learning to Sparsify Stochastic Linear Bandits / Wang, Zhengmiao; Chi, Ming; Liu, Zhi-Wei; Ye, Lintao; Chiasserini, Carla Fabiana. - (2026). ( 35th International Joint Conference on Artificial Intelligence - 29th European Conference on Artificial Intelligence (IJCAI-ECAI) Bremen (Ger) 15 - 21 August 2026).

Learning to Sparsify Stochastic Linear Bandits

Carla Fabiana Chiasserini
2026

Abstract

This paper addresses the problem of learning to sparsify stochastic linear bandits, where a decision-maker sequentially selects actions from a high-dimensional space subject to a sparsity constraint on the number of nonzero elements in the action vector. The key challenge lies in minimizing cumulative regret while tackling the potential NP-hardness of finding optimal sparse actions due to the inherent combinatorial structure of the problem. We propose an adaptively phased exploration and exploitation algorithmic framework, utilizing ordinary least squares for parameter learning and specialized subroutines for sparse action selection. When the action set is a Euclidean ball, optimal sparse actions can be efficiently computed, enabling us to establish a O ̃(d√T ) regret, where d is the dimension of the action vector and T is the time horizon length. For general convex and compact action sets where finding optimal sparse actions is intractable, we employ a greedy sub-routine. For general strongly convex action sets, we derive a O ̃(d T ) α-regret; for general compact sets lacking strong convexity, we establish a O ̃(dT2/3) α-regret, where α pertains to the approximation ratio of the greedy algorithm. Finally, we validate the performance of our algorithms using extensive experiments including an application to recommendation system.
File in questo prodotto:
File Dimensione Formato  
IJCAI_final_version.pdf

accesso riservato

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 702.1 kB
Formato Adobe PDF
702.1 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3010832