logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-09292025-141733


Tipo di tesi
Tesi di laurea magistrale
Autore
ADNAN, MUHAMMAD
URN
etd-09292025-141733
Titolo
Evolution of Purchase Behaviours through Embeddings
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Nanni, Mirco
Parole chiave
  • clustering
  • Embeddings
  • Mds
  • pivot
  • t-sne
  • Umap
Data inizio appello
17/10/2025
Consultabilità
Completa
Riassunto
This thesis investigates the use of embedding models to analyze product stability and consumer behaviour in large scale retail distribution. Using transaction data from a supermarket, products were represented as vectors in an embedding space, capturing patterns of co-occurrence in customer shopping baskets. The study focuses on how these embeddings evolve over time and whether they provide reliable signals for understanding both stable and unstable purchasing trends.

The methodology is structured in three main stages. First, multiple embedding models were trained and compared using similarity measures such as cosine distance, dot product, Euclidean distance, and the Jaccard index. These comparisons revealed that while global relationships between products remain relatively stable, local neighbourhoods can fluctuate due to temporal dynamics or model initialization. Second, a pivot-based representation was introduced to provide a more robust framework for temporal analysis. Representative pivot products were selected through the farthest-first traversal algorithm, and their stability was evaluated with an error metric and trajectory visualizations using dimensionality reduction techniques (t-SNE, UMAP, and MDS). This analysis highlighted that while many products retain stable embeddings, others display strong variability caused by seasonality, promotions, or sparse sales. Finally, clustering experiments were performed across different levels of the product taxonomy (SETTORE, REPARTO, and CATEGORIA) as well as directly on embedding spaces. These results showed that stability and instability are unevenly distributed across categories, with fresh food and seasonal products showing greater variability compared to personal care or household items.

The findings demonstrate that embeddings offer a powerful tool for uncovering hidden structures in retail data and for monitoring the evolution of products over time. Stable products can serve as reliable anchors for representation, while unstable products highlight dynamic aspects of consumer behaviour. Beyond academic insights, these methods have practical potential for applications in recommendation systems, demand forecasting, and retail strategy.
File