Thanks to its superior learning capabilities and its model-free nature, Reinforcement Learning (RL) is increasingly regarded as an effective solution for addressing complex optimization tasks such as energy management in Hybrid Electric Vehicles (HEVs). In this paper, we implement a Soft Actor-Critic (SAC) agent on a digital twin of a plug-in Hybrid Electric Vehicle (pHEV) operating in charge-sustaining mode. We employ multi-cycle training, which significantly improves the SAC model’s ability to generalize across diverse conditions. We fist evaluate the SAC agent capabilities on the Worldwide harmonized Light-duty vehicles Test Cycle (WLTC) by comparing its performance to the global optimum achieved by Dynamic Programming (DP), a local optimization strategy, i.e., Equivalent Consumption Minimization Strategy (ECMS), and a Double Deep Q-Learning (DDQL) algorithm. Furthermore, we test the agent across a broad range of driving cycles to assess its ability to generalize to scenarios beyond those used during training. Simulation results show that the SAC agent achieves results close to the optimal benchmark set by the DP, with CO emissions differing by only 3-4%.

A Cutting-Edge Energy Management System for a Hybrid Electric Vehicle relying on Soft Actor–Critic Deep Reinforcement Learning / Tresca, Luigi; Pulvirenti, Luca; Rolando, Luciano. - In: TRANSPORTATION ENGINEERING. - ISSN 2666-691X. - ELETTRONICO. - 19:(2025). [10.1016/j.treng.2025.100308]

A Cutting-Edge Energy Management System for a Hybrid Electric Vehicle relying on Soft Actor–Critic Deep Reinforcement Learning

Tresca, Luigi;Pulvirenti, Luca;Rolando, Luciano
2025

Abstract

Thanks to its superior learning capabilities and its model-free nature, Reinforcement Learning (RL) is increasingly regarded as an effective solution for addressing complex optimization tasks such as energy management in Hybrid Electric Vehicles (HEVs). In this paper, we implement a Soft Actor-Critic (SAC) agent on a digital twin of a plug-in Hybrid Electric Vehicle (pHEV) operating in charge-sustaining mode. We employ multi-cycle training, which significantly improves the SAC model’s ability to generalize across diverse conditions. We fist evaluate the SAC agent capabilities on the Worldwide harmonized Light-duty vehicles Test Cycle (WLTC) by comparing its performance to the global optimum achieved by Dynamic Programming (DP), a local optimization strategy, i.e., Equivalent Consumption Minimization Strategy (ECMS), and a Double Deep Q-Learning (DDQL) algorithm. Furthermore, we test the agent across a broad range of driving cycles to assess its ability to generalize to scenarios beyond those used during training. Simulation results show that the SAC agent achieves results close to the optimal benchmark set by the DP, with CO emissions differing by only 3-4%.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2998285
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo