Cognitive robotic systems require low-latency, reliable perception to support safe and context-aware interaction with humans. Multimodal biometrics (e.g., face and gait) improve robustness under real-world variations but the resulting pipelines are often too compute-intensive and memory-heavy for edge deployment. This paper explores the possibility of exploiting multimodal redundancy to enable aggressive optimization of person recognition stacks without excessive system-level degradation. We analyze a modular face+gait pipeline and apply architectural exploration together with post-training compression, namely quantization and pruning, targeting the dominant bottleneck (face recognition). Experiments on CASIA-B under a Rank-1 protocol over near-frontal views show that while face-only accuracy degrades significantly under compression (0.971 to 0.825), score-level fusion remains comparatively stable (0.986 to 0.966) while improving throughput from 49.8 to 75.2 FPS and reducing the deployed footprint from 295.2 MB to 111.9 MB. These results indicate that multimodal fusion can serve as a resilience mechanism, enabling lightweight and optimized perception modules for real-time cognitive agents.

Optimization of Face-Gait Person Identification Pipelines for Edge-Deployed Cognitive Agents / Boscolo, Federico; De Mola, Grazia; Lamberti, Fabrizio. - ELETTRONICO. - (In corso di stampa). ( IEEE COMPSAC Madrid (Spain) July 7-10, 2026).

Optimization of Face-Gait Person Identification Pipelines for Edge-Deployed Cognitive Agents

Boscolo,Federico;Lamberti,Fabrizio
In corso di stampa

Abstract

Cognitive robotic systems require low-latency, reliable perception to support safe and context-aware interaction with humans. Multimodal biometrics (e.g., face and gait) improve robustness under real-world variations but the resulting pipelines are often too compute-intensive and memory-heavy for edge deployment. This paper explores the possibility of exploiting multimodal redundancy to enable aggressive optimization of person recognition stacks without excessive system-level degradation. We analyze a modular face+gait pipeline and apply architectural exploration together with post-training compression, namely quantization and pruning, targeting the dominant bottleneck (face recognition). Experiments on CASIA-B under a Rank-1 protocol over near-frontal views show that while face-only accuracy degrades significantly under compression (0.971 to 0.825), score-level fusion remains comparatively stable (0.986 to 0.966) while improving throughput from 49.8 to 75.2 FPS and reducing the deployed footprint from 295.2 MB to 111.9 MB. These results indicate that multimodal fusion can serve as a resilience mechanism, enabling lightweight and optimized perception modules for real-time cognitive agents.
In corso di stampa
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3010607
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo