Dynamic Vision Sensors (DVS) offer a unique advantage in capturing changes in luminance asynchronously, providing high temporal resolution and efficiency, making them particularly suitable for applications like egocentric vision and autonomous driving. However, adapting the sparse and asynchronous nature of DVS data for traditional non-recurrent deep learning models, such as convolutional neural networks (CNNs) and transformer-based architectures, poses challenges. In fact, classical methods, such as time surfaces and voxel grids, convert event-based data into a form suitable for frame-based Deep Neural Networks (DNNs). While effective, these methods often sacrifice the fine-grained temporal details intrinsic to DVS data, especially when requiring high throughput predictions. This can diminish the advantages of DVS in capturing fast-moving or transient phenomena. We aim to contribute addressing this issue and propose a dynamic pre-processing pipeline called Memory of Events through Spatial Attention (MESA), that enhances the currently used event-based data representations. This is obtained by storing events in a memory tensor with pixel-wise adaptive forgetting factors generated in real time through a spatial-attention module. Tested on multiple computer vision tasks, this method enhances the performance of state-of-the-art non-recurrent DNNs with minimal computational cost. In particular, by using MESA, the accuracy on CIFAR10-DVS with MobileViT-v2s improves by more than 15% and with DETR-ResNet50, the mAP on the PEDRo object detection dataset is three times higher than the baseline achieved with time surfaces alone. Furthermore, when estimating pupil position on the 3ET+ dataset using MobileNet-v3s, MESA reduces the Euclidean distance error by 36% compared to using time surfaces alone.
MESA: A Dynamical Attention-based Pre-processing Pipeline for High-throughput Event-based Computer Vision Tasks / Bich, Philippe; Prono, Luciano; Boretti, Chiara; Pareschi, Fabio; Rovatti, Riccardo; Setti, Gianluca. - In: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS. II, EXPRESS BRIEFS. - ISSN 1549-7747. - STAMPA. - (In corso di stampa).
MESA: A Dynamical Attention-based Pre-processing Pipeline for High-throughput Event-based Computer Vision Tasks
Philippe Bich;Luciano Prono;Chiara Boretti;Fabio Pareschi;Gianluca Setti
In corso di stampa
Abstract
Dynamic Vision Sensors (DVS) offer a unique advantage in capturing changes in luminance asynchronously, providing high temporal resolution and efficiency, making them particularly suitable for applications like egocentric vision and autonomous driving. However, adapting the sparse and asynchronous nature of DVS data for traditional non-recurrent deep learning models, such as convolutional neural networks (CNNs) and transformer-based architectures, poses challenges. In fact, classical methods, such as time surfaces and voxel grids, convert event-based data into a form suitable for frame-based Deep Neural Networks (DNNs). While effective, these methods often sacrifice the fine-grained temporal details intrinsic to DVS data, especially when requiring high throughput predictions. This can diminish the advantages of DVS in capturing fast-moving or transient phenomena. We aim to contribute addressing this issue and propose a dynamic pre-processing pipeline called Memory of Events through Spatial Attention (MESA), that enhances the currently used event-based data representations. This is obtained by storing events in a memory tensor with pixel-wise adaptive forgetting factors generated in real time through a spatial-attention module. Tested on multiple computer vision tasks, this method enhances the performance of state-of-the-art non-recurrent DNNs with minimal computational cost. In particular, by using MESA, the accuracy on CIFAR10-DVS with MobileViT-v2s improves by more than 15% and with DETR-ResNet50, the mAP on the PEDRo object detection dataset is three times higher than the baseline achieved with time surfaces alone. Furthermore, when estimating pupil position on the 3ET+ dataset using MobileNet-v3s, MESA reduces the Euclidean distance error by 36% compared to using time surfaces alone.| File | Dimensione | Formato | |
|---|---|---|---|
|
FINAL VERSION.pdf
accesso aperto
Descrizione: Author's version
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
840.59 kB
Formato
Adobe PDF
|
840.59 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3003607
