Modern automatic comment generation tools often fail to capture data-centric workflow intents, because syntax alone provides weak signals of how code transforms data. We instead capture semantic data differences, symbolic descriptions of post-execution data transformations, and integrate them with code through a dual-encoder architecture. We evaluate this approach on a dataset of executed Python notebooks pairing code, effect sequences, and human comments. When no transformation is detected, the full pipeline outperforms the baseline across both automatic metrics and human evaluation. When transformations are present, the baseline remains competitive, though some simplified variants surpass it on specific metrics. We make available all software and data to encourage replication and further studies in this area. While our experiments focus on comment generation, our core contribution is broader: we introduce execution-aware embeddings and argue for their applicability to a variety of downstream tasks.

From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation / Fantino, Giacomo; Vetro', Antonio; Torchiano, Marco; Cappelluti, Federica. - ELETTRONICO. - (In corso di stampa). ( IEEE/ACM 48th International Conference on Software Engineering - New Ideas and Emerging Results (NIER) track Rio de Janeiro (BRA) April 12 - 18, 2026) [10.1145/3786582.3786826].

From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation

Giacomo Fantino;Antonio Vetro';Marco Torchiano;Federica Cappelluti
In corso di stampa

Abstract

Modern automatic comment generation tools often fail to capture data-centric workflow intents, because syntax alone provides weak signals of how code transforms data. We instead capture semantic data differences, symbolic descriptions of post-execution data transformations, and integrate them with code through a dual-encoder architecture. We evaluate this approach on a dataset of executed Python notebooks pairing code, effect sequences, and human comments. When no transformation is detected, the full pipeline outperforms the baseline across both automatic metrics and human evaluation. When transformations are present, the baseline remains competitive, though some simplified variants surpass it on specific metrics. We make available all software and data to encourage replication and further studies in this area. While our experiments focus on comment generation, our core contribution is broader: we introduce execution-aware embeddings and argue for their applicability to a variety of downstream tasks.
In corso di stampa
File in questo prodotto:
File Dimensione Formato  
NIER.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 502.02 kB
Formato Adobe PDF
502.02 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3006348