The fourth industrial (I4.0) revolution encourages automatic online monitoring of all products to achieve zero-defect and high-quality production. In this scenario, collaborative robots, in which humans and robots share the same workspace, are a suitable solution that integrates the precision of a robot with the ability and flexibility of a human. To improve human-robot collaboration, human changeable choices or even non-significant mistakes should be allowed or corrected during work. This paper proposes a robust online optimization of the Dassembly sequence through Robust Adversaria lReinforcement Learning (RARL), where an artificial agent is deliberately trying to boycott the assembly completion. To demonstrate the applicability of robust human-robot collaborative assembly using adversarial RL, an environment composed of Markov Decision Process (MDP) like grid world is developed and a multi-agent RL approach is integrated. The results of the framework are promising: the robot observation on human activities has been successfully achieved thanks to a penalty-reward system adopted and the alternation of human to robot actions for the wrong terminal state is the one pursued by the human, but due to robot blockage wrong actions, the right terminal state is followed by human, which is the same as the robot target.

Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell / Alessio, Alessandro; Aliev, Khurshid; Antonelli, Dario. - (2022), pp. 25-34. (Intervento presentato al convegno Manufacturing) [10.1007/978-3-030-99310-8_3].

Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell

Alessandro Alessio;Khurshid Aliev;Dario Antonelli
2022

Abstract

The fourth industrial (I4.0) revolution encourages automatic online monitoring of all products to achieve zero-defect and high-quality production. In this scenario, collaborative robots, in which humans and robots share the same workspace, are a suitable solution that integrates the precision of a robot with the ability and flexibility of a human. To improve human-robot collaboration, human changeable choices or even non-significant mistakes should be allowed or corrected during work. This paper proposes a robust online optimization of the Dassembly sequence through Robust Adversaria lReinforcement Learning (RARL), where an artificial agent is deliberately trying to boycott the assembly completion. To demonstrate the applicability of robust human-robot collaborative assembly using adversarial RL, an environment composed of Markov Decision Process (MDP) like grid world is developed and a multi-agent RL approach is integrated. The results of the framework are promising: the robot observation on human activities has been successfully achieved thanks to a penalty-reward system adopted and the alternation of human to robot actions for the wrong terminal state is the one pursued by the human, but due to robot blockage wrong actions, the right terminal state is followed by human, which is the same as the robot target.
2022
978-3-030-99309-2
978-3-030-99310-8
File in questo prodotto:
File Dimensione Formato  
MANUFACTURING_2022.pdf

Open Access dal 27/03/2023

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 492.22 kB
Formato Adobe PDF
492.22 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2973328