Orchestrating job scheduling and topology reconfiguration in optical data center networks (ODCNs) is essential for meeting the intensive communication demand of novel applications, such as distributed machine learning (ML) workloads. However, this task involves joint optimization of multi-dimensional resources that can barely be effectively addressed by simple rule-based policies. In this paper, we leverage the powerful state representation and self-learning capabilities from deep reinforcement learning (DRL) and propose a multi-step job schedule algorithm for ODCNs. Our design decomposes a job request into an ordered sequence of virtual machines (VMs) and the related bandwidth demand in between, and then makes a DRL agent learn how to place the VMs sequentially. To do so, we feed the agent with the global bandwidth and IT resource utilization state embedded with the previousVMallocation decisions in each step and reward the agent with both team and individual incentives. The team reward encourages the agent to jointly optimize the VM placement in multiple steps to pursue successful provisioning of the job request, while the individual reward favors advantageous local placement decisions, i.e., to prevent effective policies being overwhelmed by a few subpar decisions.We also introduce a penalty on reconfiguration to balance between performance gains and reconfiguration overheads. Simulation results under various ODCN configurations and job loads show our proposal outperforms the existing heuristic solutions and reduces the job-blocking probability and reconfiguration frequency by at least 7.35× and 4.59×, respectively.
Deep reinforcement learning-aided multi-step job scheduling in optical data center networks / Liu, Che-Yu; Chen, Xiaoliang; Proietti, Roberto; Zhu, Zuqing; Yoo, S. J. Ben. - In: JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING. - ISSN 1943-0620. - 17:9(2025). [10.1364/jocn.562531]
Deep reinforcement learning-aided multi-step job scheduling in optical data center networks
Proietti, Roberto;
2025
Abstract
Orchestrating job scheduling and topology reconfiguration in optical data center networks (ODCNs) is essential for meeting the intensive communication demand of novel applications, such as distributed machine learning (ML) workloads. However, this task involves joint optimization of multi-dimensional resources that can barely be effectively addressed by simple rule-based policies. In this paper, we leverage the powerful state representation and self-learning capabilities from deep reinforcement learning (DRL) and propose a multi-step job schedule algorithm for ODCNs. Our design decomposes a job request into an ordered sequence of virtual machines (VMs) and the related bandwidth demand in between, and then makes a DRL agent learn how to place the VMs sequentially. To do so, we feed the agent with the global bandwidth and IT resource utilization state embedded with the previousVMallocation decisions in each step and reward the agent with both team and individual incentives. The team reward encourages the agent to jointly optimize the VM placement in multiple steps to pursue successful provisioning of the job request, while the individual reward favors advantageous local placement decisions, i.e., to prevent effective policies being overwhelmed by a few subpar decisions.We also introduce a penalty on reconfiguration to balance between performance gains and reconfiguration overheads. Simulation results under various ODCN configurations and job loads show our proposal outperforms the existing heuristic solutions and reduces the job-blocking probability and reconfiguration frequency by at least 7.35× and 4.59×, respectively.| File | Dimensione | Formato | |
|---|---|---|---|
|
Deep_reinforcement_learning-aided_multi-step_job_scheduling_in_optical_data_center_networks.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
990.66 kB
Formato
Adobe PDF
|
990.66 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3008416
