Online scheduling has been an attractive field of research for over three decades. Some recent developments suggest that Reinforcement Learning (RL) techniques can effectively deal with online scheduling issues. Driven by an industrial application, in this paper we apply four of the most important RL techniques, namely Q-learning, Sarsa, Watkins’s Q(lambda), and Sarsa(lambda), to the online single-machine scheduling problem. Our main goal is to provide insights into how such techniques perform in the scheduling process. We will consider the minimization of two different and widely used objective functions: the total tardiness and the total earliness and tardiness of the jobs. The computational experiments show that Watkins’s Q(lambda) performs best in minimizing the total tardiness. At the same time, it seems that the RL approaches are not very effective in minimizing the total earliness and tardiness over large time horizons.

Reinforcement Learning Algorithms for Online Single-Machine Scheduling / Li, Yuanyuan; Fadda, Edoardo; Manerba, Daniele; Tadei, Roberto; Terzo, Olivier. - (2020), pp. 277-283. ((Intervento presentato al convegno FedCSIS [10.15439/2020F100].

Reinforcement Learning Algorithms for Online Single-Machine Scheduling.

Edoardo Fadda;Roberto Tadei;
2020

Abstract

Online scheduling has been an attractive field of research for over three decades. Some recent developments suggest that Reinforcement Learning (RL) techniques can effectively deal with online scheduling issues. Driven by an industrial application, in this paper we apply four of the most important RL techniques, namely Q-learning, Sarsa, Watkins’s Q(lambda), and Sarsa(lambda), to the online single-machine scheduling problem. Our main goal is to provide insights into how such techniques perform in the scheduling process. We will consider the minimization of two different and widely used objective functions: the total tardiness and the total earliness and tardiness of the jobs. The computational experiments show that Watkins’s Q(lambda) performs best in minimizing the total tardiness. At the same time, it seems that the RL approaches are not very effective in minimizing the total earliness and tardiness over large time horizons.
File in questo prodotto:
File Dimensione Formato  
2020_Fedcsis_RL_scheduling_V2.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.44 MB
Formato Adobe PDF
1.44 MB Adobe PDF Visualizza/Apri
Fadda-Reinforcement.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 462.37 kB
Formato Adobe PDF
462.37 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2849704