From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation

Fantino, Giacomo; Vetro', Antonio; Torchiano, Marco; Cappelluti, Federica

doi:10.1145/3786582.3786826

Modern automatic comment generation tools often fail to capture data-centric workflow intents, because syntax alone provides weak signals of how code transforms data. We instead capture semantic data differences, symbolic descriptions of post-execution data transformations, and integrate them with code through a dual-encoder architecture. We evaluate this approach on a dataset of executed Python notebooks pairing code, effect sequences, and human comments. When no transformation is detected, the full pipeline outperforms the baseline across both automatic metrics and human evaluation. When transformations are present, the baseline remains competitive, though some simplified variants surpass it on specific metrics. We make available all software and data to encourage replication and further studies in this area. While our experiments focus on comment generation, our core contribution is broader: we introduce execution-aware embeddings and argue for their applicability to a variety of downstream tasks.

From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation / Fantino, Giacomo; Vetro', Antonio; Torchiano, Marco; Cappelluti, Federica. - ELETTRONICO. - (In corso di stampa). ( IEEE/ACM 48th International Conference on Software Engineering - New Ideas and Emerging Results (NIER) track Rio de Janeiro (BRA) April 12 - 18, 2026) [10.1145/3786582.3786826].

From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation

Giacomo Fantino;Antonio Vetro';Marco Torchiano;Federica Cappelluti

In corso di stampa

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

In corso di stampa

Appare nelle tipologie

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
NIER.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Creative commons Dimensione 502.02 kB Formato Adobe PDF Visualizza/Apri	502.02 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3006348

PORTO @ Archivio Istituzionale della Ricerca

From Execution to Embedding: Enriching Code Representations with Data Difference Signals for Comment Generation

Giacomo Fantino;Antonio Vetro';Marco Torchiano;Federica Cappelluti

In corso di stampa

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)