As deep learning systems become pervasive, the demand for trustworthy and transparent AI continues to grow. Traditional feature attribution methods, however, often lack robustness and alignment with human reasoning. This tutorial moves beyond feature attribution by introducing participants to two complementary interpretability paradigms: Concept-Based Explainable AI (C-XAI) and Mechanistic Interpretability. C-XAI provides explanations grounded in high-level, human-interpretable concepts, bridging the gap between model reasoning and human understanding. In parallel, mechanistic interpretability - a quickly emerging field - focuses on reverse-engineering neural networks to uncover and disentangle the internal mechanisms that give rise to human-understandable representations. Through interactive coding sessions and hands-on exercises, attendees will gain practical experience implementing, evaluating, and comparing a variety of C-XAI and mechanistic interpretability techniques. By the end of the tutorial, participants will be equipped with a modern interpretability toolbox and a deeper understanding of how to apply them in real-world scenarios.

Beyond Input Attribution: A Hands-On Tutorial to Concept-Based Explainable AI and Mechanistic Interpretability / Pastor, Eliana; Poeta, Eleonora; Panisson, André; Perotti, Alan; Ciravegna, Gabriele. - 2:(2025), pp. 6247-6248. (Intervento presentato al convegno 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining tenutosi a Toronto ON (CAN) nel August 3 - 7, 2025) [10.1145/3711896.3737606].

Beyond Input Attribution: A Hands-On Tutorial to Concept-Based Explainable AI and Mechanistic Interpretability

Eliana Pastor;Eleonora Poeta;Alan Perotti;Gabriele Ciravegna
2025

Abstract

As deep learning systems become pervasive, the demand for trustworthy and transparent AI continues to grow. Traditional feature attribution methods, however, often lack robustness and alignment with human reasoning. This tutorial moves beyond feature attribution by introducing participants to two complementary interpretability paradigms: Concept-Based Explainable AI (C-XAI) and Mechanistic Interpretability. C-XAI provides explanations grounded in high-level, human-interpretable concepts, bridging the gap between model reasoning and human understanding. In parallel, mechanistic interpretability - a quickly emerging field - focuses on reverse-engineering neural networks to uncover and disentangle the internal mechanisms that give rise to human-understandable representations. Through interactive coding sessions and hands-on exercises, attendees will gain practical experience implementing, evaluating, and comparing a variety of C-XAI and mechanistic interpretability techniques. By the end of the tutorial, participants will be equipped with a modern interpretability toolbox and a deeper understanding of how to apply them in real-world scenarios.
2025
979-8-4007-1454-2
File in questo prodotto:
File Dimensione Formato  
KDD_2025___Hands_on.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 385.98 kB
Formato Adobe PDF
385.98 kB Adobe PDF Visualizza/Apri
3711896.3737606.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 834.95 kB
Formato Adobe PDF
834.95 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003464