The fourth industrial (I4.0) revolution encourages automatic online monitoring of all products to achieve zero-defect and high-quality production. In this scenario, collaborative robots, in which humans and robots share the same workspace, are a suitable solution that integrates the precision of a robot with the ability and flexibility of a human. To improve human-robot collaboration, human changeable choices or even non-significant mistakes should be allowed or corrected during work. This paper proposes a robust online optimization of the Dassembly sequence through Robust Adversaria lReinforcement Learning (RARL), where an artificial agent is deliberately trying to boycott the assembly completion. To demonstrate the applicability of robust human-robot collaborative assembly using adversarial RL, an environment composed of Markov Decision Process (MDP) like grid world is developed and a multi-agent RL approach is integrated. The results of the framework are promising: the robot observation on human activities has been successfully achieved thanks to a penalty-reward system adopted and the alternation of human to robot actions for the wrong terminal state is the one pursued by the human, but due to robot blockage wrong actions, the right terminal state is followed by human, which is the same as the robot target.
Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell / Alessio, Alessandro; Aliev, Khurshid; Antonelli, Dario. - (2022), pp. 25-34. (Intervento presentato al convegno Manufacturing) [10.1007/978-3-030-99310-8_3].
Robust Adversarial Reinforcement Learning for Optimal Assembly Sequence Definition in a Cobot Workcell
Alessandro Alessio;Khurshid Aliev;Dario Antonelli
2022
Abstract
The fourth industrial (I4.0) revolution encourages automatic online monitoring of all products to achieve zero-defect and high-quality production. In this scenario, collaborative robots, in which humans and robots share the same workspace, are a suitable solution that integrates the precision of a robot with the ability and flexibility of a human. To improve human-robot collaboration, human changeable choices or even non-significant mistakes should be allowed or corrected during work. This paper proposes a robust online optimization of the Dassembly sequence through Robust Adversaria lReinforcement Learning (RARL), where an artificial agent is deliberately trying to boycott the assembly completion. To demonstrate the applicability of robust human-robot collaborative assembly using adversarial RL, an environment composed of Markov Decision Process (MDP) like grid world is developed and a multi-agent RL approach is integrated. The results of the framework are promising: the robot observation on human activities has been successfully achieved thanks to a penalty-reward system adopted and the alternation of human to robot actions for the wrong terminal state is the one pursued by the human, but due to robot blockage wrong actions, the right terminal state is followed by human, which is the same as the robot target.File | Dimensione | Formato | |
---|---|---|---|
MANUFACTURING_2022.pdf
Open Access dal 27/03/2023
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
492.22 kB
Formato
Adobe PDF
|
492.22 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2973328