Tesi etd-01222025-195831 |
Link copiato negli appunti
Tipo di tesi
Tesi di dottorato di ricerca
Autore
MORELLI, DAVIDE
URN
etd-01222025-195831
Titolo
LEVERAGING ARTIFICIAL INTELLIGENCE AND COMPUTER VISION FOR THE FASHION DOMAIN
Settore scientifico disciplinare
INF/01 - INFORMATICA
Corso di studi
DOTTORATO NAZIONALE IN INTELLIGENZA ARTIFICIALE
Relatori
tutor Prof.ssa Cucchiara, Rita
relatore Prof.ssa Cornia, Marcella
relatore Prof.ssa Cornia, Marcella
Parole chiave
- deepfake detection
- image generation
- virtual try-on
Data inizio appello
19/02/2025
Consultabilità
Completa
Riassunto
This Ph.D. thesis presents several advancements in image-based virtual try-on, fashion image editing, and related tasks within the fashion industry.
On the Virtual try-on task, this thesis introduces Dress Code, a new, extensive dataset that surpasses existing datasets in size, image quality, and garment categories. It proposes novel methods based on Generative Adversarial Networks (GAN) and diffusion models, different types of architectures, and techniques to improve the generation results.
On the image generation task, the Multimodal Garment Designer architecture is introduced as the first latent diffusion model for human-centric fashion image editing conditioned on multimodal inputs. This architecture shows promise in mimicking designers' creative processes. The thesis extends the previous work, presenting the Ti-MGD model, which adds the ability to condition the generation on the fabric texture.
On the consumer-to-shop clothes retrieval task, the research proposes a novel loss function to improve performance. It then inquires about cross-modal retrieval techniques, proposing a CLIP-based method tailored for the fashion industry.
On the deepfake detection task, the research identifies common low-level features in diffusion-based deepfakes and proposes a method to disentangle semantic and perceptual information. The research also introduces the COCOFake dataset, a large collection of images generated for deepfake detection studies.
On the Virtual try-on task, this thesis introduces Dress Code, a new, extensive dataset that surpasses existing datasets in size, image quality, and garment categories. It proposes novel methods based on Generative Adversarial Networks (GAN) and diffusion models, different types of architectures, and techniques to improve the generation results.
On the image generation task, the Multimodal Garment Designer architecture is introduced as the first latent diffusion model for human-centric fashion image editing conditioned on multimodal inputs. This architecture shows promise in mimicking designers' creative processes. The thesis extends the previous work, presenting the Ti-MGD model, which adds the ability to condition the generation on the fabric texture.
On the consumer-to-shop clothes retrieval task, the research proposes a novel loss function to improve performance. It then inquires about cross-modal retrieval techniques, proposing a CLIP-based method tailored for the fashion industry.
On the deepfake detection task, the research identifies common low-level features in diffusion-based deepfakes and proposes a method to disentangle semantic and perceptual information. The research also introduces the COCOFake dataset, a large collection of images generated for deepfake detection studies.
File
Nome file | Dimensione |
---|---|
thesis.pdf | 151.24 Mb |
Contatta l’autore |