Large-scale web UI datasets are essential resources for training and evaluating machine learning models in user interface research, yet real-world web scraping produces data consisting of framework boilerplate, build artifacts, and deeply nested elements, resulting from software engineering practices. These complexities conflict with learning design patterns and complicate code understanding and manipulation. Recent advances in vision-language modelling and verifiable reward have enabled a new generation of UI-to-code systems that produce high-fidelity reproductions of web interfaces from screenshots. In this work, we apply a state-of-the-art UI-to-code model to the WebUI dataset. Our analysis shows that this transformation achieves 94% visual fidelity while reducing code size by approximately 95% and improving predicted visual quality. We release the dataset and generation pipeline to support research in layout modeling, code generation, and design research.

WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation / Calò, Tommaso; De Russis, Luigi. - ELETTRONICO. - (In corso di stampa). ( Conference on Human Factors in Computing Systems Barcelona (ESP) 13–17 April 2026).

WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation

Calò, Tommaso;De Russis, Luigi
In corso di stampa

Abstract

Large-scale web UI datasets are essential resources for training and evaluating machine learning models in user interface research, yet real-world web scraping produces data consisting of framework boilerplate, build artifacts, and deeply nested elements, resulting from software engineering practices. These complexities conflict with learning design patterns and complicate code understanding and manipulation. Recent advances in vision-language modelling and verifiable reward have enabled a new generation of UI-to-code systems that produce high-fidelity reproductions of web interfaces from screenshots. In this work, we apply a state-of-the-art UI-to-code model to the WebUI dataset. Our analysis shows that this transformation achieves 94% visual fidelity while reducing code size by approximately 95% and improving predicted visual quality. We release the dataset and generation pipeline to support research in layout modeling, code generation, and design research.
In corso di stampa
File in questo prodotto:
File Dimensione Formato  
WebUI-95.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 2.93 MB
Formato Adobe PDF
2.93 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3008329