Automated GUI testing is a crucial activity in modern Android development, yet its nature is notoriously fragile, especially when the GUI is built using dynamic frameworks like Jetpack Compose. Minor UI changes frequently break tests, flooding continuous integration pipelines with false positives, and burdening developers with costly repairs. To reduce test repair effort, we evaluate a developer-in-the-loop approach leveraging a Large Language Model, GitHub Copilot with Claude 3.7 Sonnet, as a zero-shot repair agent within Android Studio. By analyzing IDE context, this method updates broken selectors, adjusts test oracles, and maintains test semantics after GUI changes. We empirically evaluate the approach using the Bitwarden mobile app for Android, an open-source project containing 1083 GUI tests. We analyzed the test suite across two recent application versions, reporting failures and using the LLM to repair the failing tests. Our evaluation investigates test fragility, effectiveness of zero-shot LLM-based repairs, additional benefits from retrying prompts, and improvements from brief developer interactions. Results show that a single zero-shot prompt recovers a significant proportion of failing tests, reducing manual maintenance efforts. Another retry provided minimal additional benefits, whereas brief developer interactions considerably enhanced recovery rates. Our findings indicate that integrating LLM-driven techniques substantially eases the maintenance burden of GUI test suites, ensuring robustness against rapid UI evolution in Jetpack Compose applications.
An Analysis of the Test Repair Capability of LLMs in Android GUI Testing / Fedriga, Alessandro; Fulcini, Tommaso; Coppola, Riccardo; Amalfitano, Domenico; Distante, Damiano; Ricca, Filippo. - ELETTRONICO. - (In corso di stampa). (Intervento presentato al convegno 18th International Conference on the Quality of Information and Communications Technology tenutosi a Lisbon (POR) nel 3-5 September 2025).
An Analysis of the Test Repair Capability of LLMs in Android GUI Testing
Alessandro Fedriga;Tommaso Fulcini;Riccardo Coppola;Filippo Ricca
In corso di stampa
Abstract
Automated GUI testing is a crucial activity in modern Android development, yet its nature is notoriously fragile, especially when the GUI is built using dynamic frameworks like Jetpack Compose. Minor UI changes frequently break tests, flooding continuous integration pipelines with false positives, and burdening developers with costly repairs. To reduce test repair effort, we evaluate a developer-in-the-loop approach leveraging a Large Language Model, GitHub Copilot with Claude 3.7 Sonnet, as a zero-shot repair agent within Android Studio. By analyzing IDE context, this method updates broken selectors, adjusts test oracles, and maintains test semantics after GUI changes. We empirically evaluate the approach using the Bitwarden mobile app for Android, an open-source project containing 1083 GUI tests. We analyzed the test suite across two recent application versions, reporting failures and using the LLM to repair the failing tests. Our evaluation investigates test fragility, effectiveness of zero-shot LLM-based repairs, additional benefits from retrying prompts, and improvements from brief developer interactions. Results show that a single zero-shot prompt recovers a significant proportion of failing tests, reducing manual maintenance efforts. Another retry provided minimal additional benefits, whereas brief developer interactions considerably enhanced recovery rates. Our findings indicate that integrating LLM-driven techniques substantially eases the maintenance burden of GUI test suites, ensuring robustness against rapid UI evolution in Jetpack Compose applications.File | Dimensione | Formato | |
---|---|---|---|
Quatic_LLM_test_repair (3).pdf
accesso riservato
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
502.74 kB
Formato
Adobe PDF
|
502.74 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002251