Remote sensing tasks, such as land cover classification, are increasingly becoming multimodal problems, where information from multiple imaging devices, complementing each other, can be fused. In particular, synergies between optical and synthetic aperture radar (SAR) are widely recognized to be beneficial in a variety of tasks. At the same time, archival of multimodal imagery for global coverage poses significant storage requirements due to the multitude of available sensors, and their increasingly higher resolutions. In this paper, we exploit redundancies between SAR and optical imaging modalities to create a joint encoding that improves storage efficiency. A novel neural network design with progressive attentive fusion modules is proposed for joint compression. The model is also promptable at test time with a desired tradeoff between the input modalities, to enable flexibility in the fidelity of the joint representation to each of them. Moreover, we show how end-to-end optimization of the joint compression model, including its modality tradeoff prompt, allows for better accuracy on downstream tasks leveraging multimodal inference when a constraint on the rate is to be met.

Joint SAR–Optical Image Compression with Tunable Progressive Attentive Fusion / Valsesia, D.; Bianchi, T.. - In: REMOTE SENSING. - ISSN 2072-4292. - 17:13(2025). [10.3390/rs17132189]

Joint SAR–Optical Image Compression with Tunable Progressive Attentive Fusion

Valsesia D.;Bianchi T.
2025

Abstract

Remote sensing tasks, such as land cover classification, are increasingly becoming multimodal problems, where information from multiple imaging devices, complementing each other, can be fused. In particular, synergies between optical and synthetic aperture radar (SAR) are widely recognized to be beneficial in a variety of tasks. At the same time, archival of multimodal imagery for global coverage poses significant storage requirements due to the multitude of available sensors, and their increasingly higher resolutions. In this paper, we exploit redundancies between SAR and optical imaging modalities to create a joint encoding that improves storage efficiency. A novel neural network design with progressive attentive fusion modules is proposed for joint compression. The model is also promptable at test time with a desired tradeoff between the input modalities, to enable flexibility in the fidelity of the joint representation to each of them. Moreover, we show how end-to-end optimization of the joint compression model, including its modality tradeoff prompt, allows for better accuracy on downstream tasks leveraging multimodal inference when a constraint on the rate is to be met.
2025
File in questo prodotto:
File Dimensione Formato  
remotesensing-17-02189.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 3.94 MB
Formato Adobe PDF
3.94 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3007408