Tesi etd-02062024-200937 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
LI, MALIO
URN
etd-02062024-200937
Titolo
Policy Ensemble with Indirect Interaction
Dipartimento
INFORMATICA
Corso di studi
INFORMATICA
Relatori
relatore Prof. Lomonaco, Vincenzo
relatore Dott. Piccoli Elia
relatore Dott. Piccoli Elia
Parole chiave
- policy ensemble
- reinforcement learning
Data inizio appello
23/02/2024
Consultabilità
Completa
Riassunto
Reinforcement Learning (RL) is a branch of Machine Learning that teaches agents how to act optimally in a given environment.
Recently, thanks to the rapid growth in Deep Learning, many complex algorithms that take advantage of Neural Networks have been developed, and in many tasks, such as games, RL agents can easily outperform the best human player in the world.
Despite the potential, RL has some more complex tasks that are still unsolved or extremely difficult to handle, for example, autonomous car driving, where the observation space is huge.
Currently, the main strategy in RL is to train a single policy to solve a given task, maybe with knowledge transfer from previous policies to make the training faster.
In this thesis, the aim is to try to simplify the given problem by breaking it into many sub-tasks, then train optimal sub-policies on simpler tasks, and finally use a Master Policy to learn the combination of sub-policies to solve the initial complex task without any prior knowledge.
Each sub-policy can be seen as a skill that the agent can exploit.
The proposed method is highly different from classical RL algorithms, thus additional metrics instead of rewards are used to see how Master Policy outperforms other state-of-the-art methods.
Finally, it shows some studies on how the simple weighted ensemble is done using different skill sets and the importance of each skill during different road scenarios.
Recently, thanks to the rapid growth in Deep Learning, many complex algorithms that take advantage of Neural Networks have been developed, and in many tasks, such as games, RL agents can easily outperform the best human player in the world.
Despite the potential, RL has some more complex tasks that are still unsolved or extremely difficult to handle, for example, autonomous car driving, where the observation space is huge.
Currently, the main strategy in RL is to train a single policy to solve a given task, maybe with knowledge transfer from previous policies to make the training faster.
In this thesis, the aim is to try to simplify the given problem by breaking it into many sub-tasks, then train optimal sub-policies on simpler tasks, and finally use a Master Policy to learn the combination of sub-policies to solve the initial complex task without any prior knowledge.
Each sub-policy can be seen as a skill that the agent can exploit.
The proposed method is highly different from classical RL algorithms, thus additional metrics instead of rewards are used to see how Master Policy outperforms other state-of-the-art methods.
Finally, it shows some studies on how the simple weighted ensemble is done using different skill sets and the importance of each skill during different road scenarios.
File
Nome file | Dimensione |
---|---|
Master_Thesis_6.pdf | 1.72 Mb |
Contatta l’autore |