Modern automatic comment generation tools often fail to capture data-centric workflow intents, because syntax alone provides weak signals of how code transforms data. We instead capture semantic data differences, symbolic descriptions of post-execution data transformations, and integrate them with code through a dual-encoder architecture. We evaluate this approach on a dataset of executed Python notebooks pairing code, effect sequences, and human comments. When no transformation is detected, the full pipeline outperforms the baseline across both automatic metrics and human evaluation. When transformations are present, the baseline remains competitive, though some simplified variants surpass it on specific metrics. We make available all software and data to encourage replication and further studies in this area. While our experiments focus on comment generation, our core contribution is broader: we introduce execution-aware embeddings and argue for their applicability to a variety of downstream tasks.
From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation / Fantino, Giacomo; Vetro', Antonio; Torchiano, Marco; Cappelluti, Federica. - ELETTRONICO. - (In corso di stampa). ( IEEE/ACM 48th International Conference on Software Engineering - New Ideas and Emerging Results (NIER) track Rio de Janeiro (BRA) April 12 - 18, 2026) [10.1145/3786582.3786826].
From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation
Giacomo Fantino;Antonio Vetro';Marco Torchiano;Federica Cappelluti
In corso di stampa
Abstract
Modern automatic comment generation tools often fail to capture data-centric workflow intents, because syntax alone provides weak signals of how code transforms data. We instead capture semantic data differences, symbolic descriptions of post-execution data transformations, and integrate them with code through a dual-encoder architecture. We evaluate this approach on a dataset of executed Python notebooks pairing code, effect sequences, and human comments. When no transformation is detected, the full pipeline outperforms the baseline across both automatic metrics and human evaluation. When transformations are present, the baseline remains competitive, though some simplified variants surpass it on specific metrics. We make available all software and data to encourage replication and further studies in this area. While our experiments focus on comment generation, our core contribution is broader: we introduce execution-aware embeddings and argue for their applicability to a variety of downstream tasks.| File | Dimensione | Formato | |
|---|---|---|---|
|
NIER.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Creative commons
Dimensione
502.02 kB
Formato
Adobe PDF
|
502.02 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3006348
