Machine learning has given encrypted traffic classification a new momentum. Yet, once deployed, models often fail due to hidden shortcut features, i.e., spurious correlations learned from training data that do not hold in new environments. Prior work has shown their negative impact through costly manual intervention. Here, we present ShortcutCatcher, an automated, model-agnostic framework that detects and mitigates shortcuts with the help of explainable AI. The key idea is to contrast model behaviour on two datasets: a large training dataset and a separate verification dataset that differs in scenario but shares the same feature schema. ShortcutCatcher integrates feature explanation with cross-scenario evaluation in a closed loop, iteratively removing those critical features that would not be valid in deployment. Across multiple encrypted traffic classification tasks and model architectures, ShortcutCatcher uncovers shortcut dependencies and improves cross-scenario generalisation, up to three times over standard training. In addition, ShortcutCatcher exposes dataset limitations where collection artefacts act as silent shortcuts that have gone so far unnoticed, allowing us to finally expose realistic performance without assuming that the underlying task is intrinsically easy
ShortcutCatcher: Making Traffic Classification Reliable / Zhao, Y., Boffa, M., Vassio, L., Mellia, M.. - In: THE PROCEEDINGS OF THE ACM ON NETWORKING. - ISSN 2834-5509. - 4:(2026), pp. 1-15. [10.1145/3808671]
ShortcutCatcher: Making Traffic Classification Reliable
Yuqi Zhao;Matteo Boffa;Luca Vassio;Marco Mellia
2026
Abstract
Machine learning has given encrypted traffic classification a new momentum. Yet, once deployed, models often fail due to hidden shortcut features, i.e., spurious correlations learned from training data that do not hold in new environments. Prior work has shown their negative impact through costly manual intervention. Here, we present ShortcutCatcher, an automated, model-agnostic framework that detects and mitigates shortcuts with the help of explainable AI. The key idea is to contrast model behaviour on two datasets: a large training dataset and a separate verification dataset that differs in scenario but shares the same feature schema. ShortcutCatcher integrates feature explanation with cross-scenario evaluation in a closed loop, iteratively removing those critical features that would not be valid in deployment. Across multiple encrypted traffic classification tasks and model architectures, ShortcutCatcher uncovers shortcut dependencies and improves cross-scenario generalisation, up to three times over standard training. In addition, ShortcutCatcher exposes dataset limitations where collection artefacts act as silent shortcuts that have gone so far unnoticed, allowing us to finally expose realistic performance without assuming that the underlying task is intrinsically easy| File | Dimensione | Formato | |
|---|---|---|---|
|
3808671.pdf
accesso aperto
Descrizione: Open Access
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
792.37 kB
Formato
Adobe PDF
|
792.37 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3011672
