Multi-view observations offer a broader perception of the real world, compared to observations acquired from a single viewpoint. While existing multi-view 2D diffusion models for novel view synthesis typically rely on a single conditioning reference image, a limited number of methods accommodate a multiple number thereof, by explicitly conditioning the generation process through tailored attention mechanisms. In contrast, we introduce DIMVIS, a novel method enabling the conditional generation in multi-view settings by means of a joint diffusion model. DIMVIS capitalizes on a pre-trained diffusion model, while combining an innovative masked diffusion process to implicitly learn the underlying conditional data distribution, which endows our method with the ability to produce multiple images given a flexible number of reference views. Our experimental evaluation demonstrates DIMVIS’s superior performance compared to current state-of-the-art methods, while achieving reference-to-target and target-to-target visual consistency.
DiMViS: Diffusion-based Multi-View Synthesis / DI GIACOMO, Giuseppe; Franzese, Giulio; Cerquitelli, Tania; Chiasserini, Carla Fabiana; Michiardi, Pietro. - STAMPA. - (2024). (Intervento presentato al convegno ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling 2nd SPIGM @ ICML tenutosi a Vienna (Austria) nel July 2024).
DiMViS: Diffusion-based Multi-View Synthesis
Giuseppe Di Giacomo;Tania Cerquitelli;Carla Fabiana Chiasserini;
2024
Abstract
Multi-view observations offer a broader perception of the real world, compared to observations acquired from a single viewpoint. While existing multi-view 2D diffusion models for novel view synthesis typically rely on a single conditioning reference image, a limited number of methods accommodate a multiple number thereof, by explicitly conditioning the generation process through tailored attention mechanisms. In contrast, we introduce DIMVIS, a novel method enabling the conditional generation in multi-view settings by means of a joint diffusion model. DIMVIS capitalizes on a pre-trained diffusion model, while combining an innovative masked diffusion process to implicitly learn the underlying conditional data distribution, which endows our method with the ability to produce multiple images given a flexible number of reference views. Our experimental evaluation demonstrates DIMVIS’s superior performance compared to current state-of-the-art methods, while achieving reference-to-target and target-to-target visual consistency.File | Dimensione | Formato | |
---|---|---|---|
Poster___Multi_View_Latent_Diffusion.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
2.73 MB
Formato
Adobe PDF
|
2.73 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2989596