Tesi etd-11032022-122638 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
VEZZI, FRANCESCO
URN
etd-11032022-122638
Titolo
Explosive Motion Acquisition via Deep Reinforcement Learning for a Bio-Inspired Quadruped
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
INGEGNERIA ROBOTICA E DELL'AUTOMAZIONE
Relatori
relatore Prof. Bicchi, Antonio
supervisore Prof. Della Santina, Cosimo
supervisore Prof. Della Santina, Cosimo
Parole chiave
- deep learning
- deep reinforcement learning
- elastic
- evolutionary strategy
- explosive motion
- jump
- learning
- locomotion
- quadruped
- reinforcement learning
- soft
- springs
Data inizio appello
24/11/2022
Consultabilità
Tesi non consultabile
Riassunto
The aim of this thesis is to use Deep Reinforcement Learning for training a bio-inspired quadruped to acquire explosive motion skills that could be used for navigation in challenging natural environments.
The jumping task has been taken into account, either jumping in place or jumping forward.
The training method proposed combines evolutionary strategy and deep reinforcement learning algorithms in two steps. During the first one, ARS (Augmented Random Search) is used in combination with a sparse reward and deterministic behavior. In the second one, an imitation learning simplified approach is used to train a more complex neural network with PPO (Proximal Policy Optimization) that is successively retrained with a task-related reward function.
In the end, the agent learns how to jump and softly land starting and ending in a default position exploiting the compliant elements (joint-level parallel springs) for better performances.
The jumping task has been taken into account, either jumping in place or jumping forward.
The training method proposed combines evolutionary strategy and deep reinforcement learning algorithms in two steps. During the first one, ARS (Augmented Random Search) is used in combination with a sparse reward and deterministic behavior. In the second one, an imitation learning simplified approach is used to train a more complex neural network with PPO (Proximal Policy Optimization) that is successively retrained with a task-related reward function.
In the end, the agent learns how to jump and softly land starting and ending in a default position exploiting the compliant elements (joint-level parallel springs) for better performances.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |