Deep reinforcement learning-aided multi-step job scheduling in optical data center networks

Liu, Che-Yu; Chen, Xiaoliang; Proietti, Roberto; Zhu, Zuqing; Yoo, S. J. Ben

doi:10.1364/jocn.562531

Orchestrating job scheduling and topology reconfiguration in optical data center networks (ODCNs) is essential for meeting the intensive communication demand of novel applications, such as distributed machine learning (ML) workloads. However, this task involves joint optimization of multi-dimensional resources that can barely be effectively addressed by simple rule-based policies. In this paper, we leverage the powerful state representation and self-learning capabilities from deep reinforcement learning (DRL) and propose a multi-step job schedule algorithm for ODCNs. Our design decomposes a job request into an ordered sequence of virtual machines (VMs) and the related bandwidth demand in between, and then makes a DRL agent learn how to place the VMs sequentially. To do so, we feed the agent with the global bandwidth and IT resource utilization state embedded with the previousVMallocation decisions in each step and reward the agent with both team and individual incentives. The team reward encourages the agent to jointly optimize the VM placement in multiple steps to pursue successful provisioning of the job request, while the individual reward favors advantageous local placement decisions, i.e., to prevent effective policies being overwhelmed by a few subpar decisions.We also introduce a penalty on reconfiguration to balance between performance gains and reconfiguration overheads. Simulation results under various ODCN configurations and job loads show our proposal outperforms the existing heuristic solutions and reduces the job-blocking probability and reconfiguration frequency by at least 7.35× and 4.59×, respectively.

Deep reinforcement learning-aided multi-step job scheduling in optical data center networks / Liu, C., Chen, X., Proietti, R., Zhu, Z., Yoo, S.J.B.. - In: JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING. - ISSN 1943-0620. - 17:9(2025). [10.1364/jocn.562531]

Deep reinforcement learning-aided multi-step job scheduling in optical data center networks

Liu, Che-Yu;Chen, Xiaoliang;Proietti, Roberto;Zhu, Zuqing;Yoo, S. J. Ben

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.1364/jocn.562531
			
	Titolo della Rivista
	
				JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Deep_reinforcement_learning-aided_multi-step_job_scheduling_in_optical_data_center_networks.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 990.66 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	990.66 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3008416

PORTO @ Archivio Istituzionale della Ricerca

Deep reinforcement learning-aided multi-step job scheduling in optical data center networks

Liu, Che-Yu;Chen, Xiaoliang;Proietti, Roberto;Zhu, Zuqing;Yoo, S. J. Ben

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)