Automated code documentation has gained increasing importance as AI-assisted software engineering tools become integrated into the software lifecycle. However, most approaches rely solely on static features of code, overlooking its dynamic behavior, particularly in data-centric programming environments where code meaning is deeply tied to its effect on data. This research explores how execution-aware code documentation can be achieved by introducing semantic data differences, a structured symbolic representation of how code execution transforms data. By capturing these runtime semantics, this work aims to extend code understanding beyond syntax and lexical features. I propose a multimodal modeling framework that combines static code analysis with post-execution semantics. In support of this framework, I design a data collection pipeline that executes real-world data-centric notebooks, logs variable-level changes, and abstracts them into a grammar of semantic data differences for training and evaluation. This ongoing doctoral research aims to enable documentation systems that describe not only how code is written, but also what code does to data, a step toward more transparent, reproducible, and intelligent data-centric software.

Execution-Aware Code Documentation through Semantic Data Differences / Fantino, Giacomo. - ELETTRONICO. - (In corso di stampa). ( IEEE/ACM 48th International Conference on Software Engineering - Doctoral Symposium (DS) track Rio de Janeiro (BRA) April 12 - 18, 2026) [10.1145/3774748.3787660].

Execution-Aware Code Documentation through Semantic Data Differences

fantino
In corso di stampa

Abstract

Automated code documentation has gained increasing importance as AI-assisted software engineering tools become integrated into the software lifecycle. However, most approaches rely solely on static features of code, overlooking its dynamic behavior, particularly in data-centric programming environments where code meaning is deeply tied to its effect on data. This research explores how execution-aware code documentation can be achieved by introducing semantic data differences, a structured symbolic representation of how code execution transforms data. By capturing these runtime semantics, this work aims to extend code understanding beyond syntax and lexical features. I propose a multimodal modeling framework that combines static code analysis with post-execution semantics. In support of this framework, I design a data collection pipeline that executes real-world data-centric notebooks, logs variable-level changes, and abstracts them into a grammar of semantic data differences for training and evaluation. This ongoing doctoral research aims to enable documentation systems that describe not only how code is written, but also what code does to data, a step toward more transparent, reproducible, and intelligent data-centric software.
In corso di stampa
File in questo prodotto:
File Dimensione Formato  
Doctoral_Symposium_ICSE.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Creative commons
Dimensione 467.52 kB
Formato Adobe PDF
467.52 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3008170