The focus of the research community in the soft robotic field has been on developing innovative materials, but the design of control strategies applicable to these robotic platforms is still an open challenge. This is due to their highly nonlinear dynamics which is difficult to model and the degree of stochasticity they often incorporate. Data-driven controllers based on neural networks have recently been explored as a viable solution to be employed for these manipulators. This letter presents a neural network-based closed-loop controller, trained by a deep reinforcement learning algorithm called Trust Region Policy Optimization (TRPO). The training takes place in simulation, using an approximation of the robot forward dynamic model obtained with a Long-short Term Memory (LSTM) network. The trained controller allows following different paths executed with different velocities in the workspace of the robot. The results demonstrate that the controller is effective in normal working conditions and with a payload attached to the end-effector of the manipulator.
Closed-loop dynamic control of a soft manipulator using deep reinforcement learning / Centurelli, A.; Arleo, L.; Rizzo, A.; Tolu, S.; Laschi, C.; Falotico, E.. - In: IEEE ROBOTICS AND AUTOMATION LETTERS. - ISSN 2377-3766. - 7:2(2022), pp. 4741-4748. [10.1109/LRA.2022.3146903]
Closed-loop dynamic control of a soft manipulator using deep reinforcement learning
A. Centurelli;A. Rizzo;
2022
Abstract
The focus of the research community in the soft robotic field has been on developing innovative materials, but the design of control strategies applicable to these robotic platforms is still an open challenge. This is due to their highly nonlinear dynamics which is difficult to model and the degree of stochasticity they often incorporate. Data-driven controllers based on neural networks have recently been explored as a viable solution to be employed for these manipulators. This letter presents a neural network-based closed-loop controller, trained by a deep reinforcement learning algorithm called Trust Region Policy Optimization (TRPO). The training takes place in simulation, using an approximation of the robot forward dynamic model obtained with a Long-short Term Memory (LSTM) network. The trained controller allows following different paths executed with different velocities in the workspace of the robot. The results demonstrate that the controller is effective in normal working conditions and with a payload attached to the end-effector of the manipulator.File | Dimensione | Formato | |
---|---|---|---|
2022_RAL_SoftRobots.pdf
accesso riservato
Descrizione: Version of record
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
3.77 MB
Formato
Adobe PDF
|
3.77 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
2022_RAL_SoftRobots_AcceptedPostPrint.pdf
accesso aperto
Descrizione: Post-print dell'autore
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
3.75 MB
Formato
Adobe PDF
|
3.75 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2957680