The focus of the research community in the soft robotic field has been on developing innovative materials, but the design of control strategies applicable to these robotic platforms is still an open challenge. This is due to their highly nonlinear dynamics which is difficult to model and the degree of stochasticity they often incorporate. Data-driven controllers based on neural networks have recently been explored as a viable solution to be employed for these manipulators. This letter presents a neural network-based closed-loop controller, trained by a deep reinforcement learning algorithm called Trust Region Policy Optimization (TRPO). The training takes place in simulation, using an approximation of the robot forward dynamic model obtained with a Long-short Term Memory (LSTM) network. The trained controller allows following different paths executed with different velocities in the workspace of the robot. The results demonstrate that the controller is effective in normal working conditions and with a payload attached to the end-effector of the manipulator.

Closed-loop dynamic control of a soft manipulator using deep reinforcement learning / Centurelli, A.; Arleo, L.; Rizzo, A.; Tolu, S.; Laschi, C.; Falotico, E.. - In: IEEE ROBOTICS AND AUTOMATION LETTERS. - ISSN 2377-3766. - 7:2(2022), pp. 4741-4748. [10.1109/LRA.2022.3146903]

Closed-loop dynamic control of a soft manipulator using deep reinforcement learning

A. Centurelli;A. Rizzo;
2022

Abstract

The focus of the research community in the soft robotic field has been on developing innovative materials, but the design of control strategies applicable to these robotic platforms is still an open challenge. This is due to their highly nonlinear dynamics which is difficult to model and the degree of stochasticity they often incorporate. Data-driven controllers based on neural networks have recently been explored as a viable solution to be employed for these manipulators. This letter presents a neural network-based closed-loop controller, trained by a deep reinforcement learning algorithm called Trust Region Policy Optimization (TRPO). The training takes place in simulation, using an approximation of the robot forward dynamic model obtained with a Long-short Term Memory (LSTM) network. The trained controller allows following different paths executed with different velocities in the workspace of the robot. The results demonstrate that the controller is effective in normal working conditions and with a payload attached to the end-effector of the manipulator.
File in questo prodotto:
File Dimensione Formato  
2022_RAL_SoftRobots.pdf

non disponibili

Descrizione: Version of record
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 3.77 MB
Formato Adobe PDF
3.77 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
2022_RAL_SoftRobots_AcceptedPostPrint.pdf

accesso aperto

Descrizione: Post-print dell'autore
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 3.75 MB
Formato Adobe PDF
3.75 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2957680