logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-01172025-141428


Tipo di tesi
Tesi di laurea magistrale
Autore
LIU, CHANG
URN
etd-01172025-141428
Titolo
Evaluating Retrieval-Augmented Generation Systems on Edge Devices
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Tonellotto, Nicola
relatore Prof. Vallati, Carlo
relatore Dott.ssa Pezzuti, Francesca
Parole chiave
  • Edge Devices
  • Information Retrieval
  • Large Language Models
  • Quantization
  • Retrieval-Augmented Generation
Data inizio appello
21/02/2025
Consultabilità
Non consultabile
Data di rilascio
21/02/2028
Riassunto
This study investigates the performance of Retrieval-Augmented Generation (RAG) systems in unconstrained and edge environments, focusing on the impact of three key effects (Accuracy Saturation, Lost in the Middle, and the Power of Noise) and the role of quantization in these effects.
We explore how RAG systems perform when deployed on edge devices with limited computational resources, memory constraints, and offload-speed limitations.
Using the BERGEN library for modular RAG system implementation and llama.cpp for efficient edge inference, we conducted experiments that incorporated dataset filtering and structured input prompts.
Our results indicate that quantization levels have a significant impact on the three effects. Higher quantization accelerates the "Accuracy Saturation" effect, reduces model performance on middle-positioned gold documents, and diminishes the "Power of Noise" effect, a phenomenon that disappears entirely at 1-bit quantization.
These findings suggest that quantization is a critical factor in RAG system performance, more so than the deployment environment, underscoring the need for optimized quantization strategies, especially for resource-constrained edge devices.
Future research should focus on improving quantization methods and exploring additional techniques such as knowledge distillation and pruning.
File