Automated code documentation has gained increasing importance as AI-assisted software engineering tools become integrated into the software lifecycle. However, most approaches rely solely on static features of code, overlooking its dynamic behavior, particularly in data-centric programming environments where code meaning is deeply tied to its effect on data. This research explores how execution-aware code documentation can be achieved by introducing semantic data differences, a structured symbolic representation of how code execution transforms data. By capturing these runtime semantics, this work aims to extend code understanding beyond syntax and lexical features. I propose a multimodal modeling framework that combines static code analysis with post-execution semantics. In support of this framework, I design a data collection pipeline that executes real-world data-centric notebooks, logs variable-level changes, and abstracts them into a grammar of semantic data differences for training and evaluation. This ongoing doctoral research aims to enable documentation systems that describe not only how code is written, but also what code does to data, a step toward more transparent, reproducible, and intelligent data-centric software.
Execution-Aware Code Documentation through Semantic Data Differences / Fantino, Giacomo. - ELETTRONICO. - (In corso di stampa). ( IEEE/ACM 48th International Conference on Software Engineering - Doctoral Symposium (DS) track Rio de Janeiro (BRA) April 12 - 18, 2026) [10.1145/3774748.3787660].
Execution-Aware Code Documentation through Semantic Data Differences
fantino
In corso di stampa
Abstract
Automated code documentation has gained increasing importance as AI-assisted software engineering tools become integrated into the software lifecycle. However, most approaches rely solely on static features of code, overlooking its dynamic behavior, particularly in data-centric programming environments where code meaning is deeply tied to its effect on data. This research explores how execution-aware code documentation can be achieved by introducing semantic data differences, a structured symbolic representation of how code execution transforms data. By capturing these runtime semantics, this work aims to extend code understanding beyond syntax and lexical features. I propose a multimodal modeling framework that combines static code analysis with post-execution semantics. In support of this framework, I design a data collection pipeline that executes real-world data-centric notebooks, logs variable-level changes, and abstracts them into a grammar of semantic data differences for training and evaluation. This ongoing doctoral research aims to enable documentation systems that describe not only how code is written, but also what code does to data, a step toward more transparent, reproducible, and intelligent data-centric software.| File | Dimensione | Formato | |
|---|---|---|---|
|
Doctoral_Symposium_ICSE.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
467.52 kB
Formato
Adobe PDF
|
467.52 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3008170
