Multi-view observations offer a broader perception of the real world, compared to observations acquired from a single viewpoint. While existing multi-view 2D diffusion models for novel view synthesis typically rely on a single conditioning reference image, a limited number of methods accommodate a multiple number thereof, by explicitly conditioning the generation process through tailored attention mechanisms. In contrast, we introduce DIMVIS, a novel method enabling the conditional generation in multi-view settings by means of a joint diffusion model. DIMVIS capitalizes on a pre-trained diffusion model, while combining an innovative masked diffusion process to implicitly learn the underlying conditional data distribution, which endows our method with the ability to produce multiple images given a flexible number of reference views. Our experimental evaluation demonstrates DIMVIS’s superior performance compared to current state-of-the-art methods, while achieving reference-to-target and target-to-target visual consistency.

DiMViS: Diffusion-based Multi-View Synthesis / DI GIACOMO, Giuseppe; Franzese, Giulio; Cerquitelli, Tania; Chiasserini, Carla Fabiana; Michiardi, Pietro. - STAMPA. - (2024). (Intervento presentato al convegno ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling 2nd SPIGM @ ICML tenutosi a Vienna (Austria) nel July 2024).

DiMViS: Diffusion-based Multi-View Synthesis

Giuseppe Di Giacomo;Tania Cerquitelli;Carla Fabiana Chiasserini;
2024

Abstract

Multi-view observations offer a broader perception of the real world, compared to observations acquired from a single viewpoint. While existing multi-view 2D diffusion models for novel view synthesis typically rely on a single conditioning reference image, a limited number of methods accommodate a multiple number thereof, by explicitly conditioning the generation process through tailored attention mechanisms. In contrast, we introduce DIMVIS, a novel method enabling the conditional generation in multi-view settings by means of a joint diffusion model. DIMVIS capitalizes on a pre-trained diffusion model, while combining an innovative masked diffusion process to implicitly learn the underlying conditional data distribution, which endows our method with the ability to produce multiple images given a flexible number of reference views. Our experimental evaluation demonstrates DIMVIS’s superior performance compared to current state-of-the-art methods, while achieving reference-to-target and target-to-target visual consistency.
File in questo prodotto:
File Dimensione Formato  
Poster___Multi_View_Latent_Diffusion.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 2.73 MB
Formato Adobe PDF
2.73 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2989596