Evaluating Retrieval-Augmented Generation Systems on Edge Devices
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
ARTIFICIAL INTELLIGENCE AND DATA ENGINEERING
Relatori
relatore Prof. Tonellotto, Nicola relatore Prof. Vallati, Carlo relatore Dott.ssa Pezzuti, Francesca
Parole chiave
Edge Devices
Information Retrieval
Large Language Models
Quantization
Retrieval-Augmented Generation
Data inizio appello
21/02/2025
Consultabilità
Non consultabile
Data di rilascio
21/02/2028
Riassunto
This study investigates the performance of Retrieval-Augmented Generation (RAG) systems in unconstrained and edge environments, focusing on the impact of three key effects (Accuracy Saturation, Lost in the Middle, and the Power of Noise) and the role of quantization in these effects. We explore how RAG systems perform when deployed on edge devices with limited computational resources, memory constraints, and offload-speed limitations. Using the BERGEN library for modular RAG system implementation and llama.cpp for efficient edge inference, we conducted experiments that incorporated dataset filtering and structured input prompts. Our results indicate that quantization levels have a significant impact on the three effects. Higher quantization accelerates the "Accuracy Saturation" effect, reduces model performance on middle-positioned gold documents, and diminishes the "Power of Noise" effect, a phenomenon that disappears entirely at 1-bit quantization. These findings suggest that quantization is a critical factor in RAG system performance, more so than the deployment environment, underscoring the need for optimized quantization strategies, especially for resource-constrained edge devices. Future research should focus on improving quantization methods and exploring additional techniques such as knowledge distillation and pruning.