logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06282022-104831


Tipo di tesi
Tesi di laurea magistrale
Autore
SILVESTRI, GIULIO
URN
etd-06282022-104831
Titolo
A New Algorithm for Lexicographic Multi-Objective Reinforcement Learning
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Cococcioni, Marco
correlatore Prof.ssa Lazzerini, Beatrice
correlatore Ing. Fiaschi, Lorenzo
Parole chiave
  • alpha theory
  • multi-objective reinforcement learning
  • non-archimedean scalarization
  • reinforcement learning
Data inizio appello
22/07/2022
Consultabilità
Tesi non consultabile
Riassunto
Reinforcement Learning (RL) implementations achieved great results in recent years, but the majority of the considered problems has a single goal.
Real world problems typically posses multiple, sometimes conflicting, objectives to be optimized for which classic single objective RL techniques are difficult to apply or do not work at all.
In the field of RL there are few proposed methods designed to be able to deal with multiple objectives.
These are called Multi Objective Reinforcement Learning (MORL) algorithms.
Some of these MORL algorithms are designed to manage multiple rewards by learning each possible tradeoff thus requiring a long training time and being unappealing to online learning.
Another set of MORL methods require the user to provide some input assuming a priori knowledge of the environment in order to either transform the set of rewards in a single scalar and then use classic RL methods or establishing some reward thresholds for each objective in order to optimize them in lexicographic order.
These two sets of algorithms are respectively called multi-policy and single-policy approaches.
In this work, we propose a parameterless method to solve MORL problems in case the user is able to provide a lexicographic ordering over the objectives.
This result is achieved by exploiting a scalarization of the rewards based on non-Archimedean quantities, the so called non-Archimedean scalarization.
The non-Archimedean scalarization has already been exploited to solve Lexicographic Multi-Objective Problems (LMOPs) in fields such as Evolutionary Optimization and Linear Programming.
The Alpha Theory framework will be considered our non-standard reference model.
We experimented by implementing and testing RL agents using these non-Archimedean quantities in order to prove the effectiveness of our method and compare it to existing approaches.
File