Tesi etd-06282022-104831

Tipo di tesi

Tesi di laurea magistrale

Autore

SILVESTRI, GIULIO

URN

etd-06282022-104831

Titolo

A New Algorithm for Lexicographic Multi-Objective Reinforcement Learning

Dipartimento

INGEGNERIA DELL'INFORMAZIONE

Corso di studi

ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING

Relatori

relatore Prof. Cococcioni, Marco
correlatore Prof.ssa Lazzerini, Beatrice
correlatore Ing. Fiaschi, Lorenzo

Parole chiave

alpha theory
multi-objective reinforcement learning
non-archimedean scalarization
reinforcement learning

Data inizio appello

22/07/2022

Consultabilità

Tesi non consultabile

Riassunto

Reinforcement Learning (RL) implementations achieved great results in recent years, but the majority of the considered problems has a single goal.
Real world problems typically posses multiple, sometimes conflicting, objectives to be optimized for which classic single objective RL techniques are difficult to apply or do not work at all.
In the field of RL there are few proposed methods designed to be able to deal with multiple objectives.
These are called Multi Objective Reinforcement Learning (MORL) algorithms.
Some of these MORL algorithms are designed to manage multiple rewards by learning each possible tradeoff thus requiring a long training time and being unappealing to online learning.
Another set of MORL methods require the user to provide some input assuming a priori knowledge of the environment in order to either transform the set of rewards in a single scalar and then use classic RL methods or establishing some reward thresholds for each objective in order to optimize them in lexicographic order.
These two sets of algorithms are respectively called multi-policy and single-policy approaches.
In this work, we propose a parameterless method to solve MORL problems in case the user is able to provide a lexicographic ordering over the objectives.
This result is achieved by exploiting a scalarization of the rewards based on non-Archimedean quantities, the so called non-Archimedean scalarization.
The non-Archimedean scalarization has already been exploited to solve Lexicographic Multi-Objective Problems (LMOPs) in fields such as Evolutionary Optimization and Linear Programming.
The Alpha Theory framework will be considered our non-standard reference model.
We experimented by implementing and testing RL agents using these non-Archimedean quantities in order to prove the effectiveness of our method and compare it to existing approaches.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-06282022-104831