Large-scale web UI datasets are essential resources for training and evaluating machine learning models in user interface research, yet real-world web scraping produces data consisting of framework boilerplate, build artifacts, and deeply nested elements, resulting from software engineering practices. These complexities conflict with learning design patterns and complicate code understanding and manipulation. Recent advances in vision-language modelling and verifiable reward have enabled a new generation of UI-to-code systems that produce high-fidelity reproductions of web interfaces from screenshots. In this work, we apply a state-of-the-art UI-to-code model to the WebUI dataset. Our analysis shows that this transformation achieves 94% visual fidelity while reducing code size by approximately 95% and improving predicted visual quality. We release the dataset and generation pipeline to support research in layout modeling, code generation, and design research.

WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation / Calò, Tommaso; De Russis, Luigi. - ELETTRONICO. - (2026), pp. 1-5. ( CHI '26: CHI Conference on Human Factors in Computing Systems Barcelona (ESP) 13–17 April 2026) [10.1145/3772363.3799359].

WebUI-95: A Large-Scale Dataset of Normalized Web Interfaces via UI-to-Code Generation

Calò, Tommaso;De Russis, Luigi
2026

Abstract

Large-scale web UI datasets are essential resources for training and evaluating machine learning models in user interface research, yet real-world web scraping produces data consisting of framework boilerplate, build artifacts, and deeply nested elements, resulting from software engineering practices. These complexities conflict with learning design patterns and complicate code understanding and manipulation. Recent advances in vision-language modelling and verifiable reward have enabled a new generation of UI-to-code systems that produce high-fidelity reproductions of web interfaces from screenshots. In this work, we apply a state-of-the-art UI-to-code model to the WebUI dataset. Our analysis shows that this transformation achieves 94% visual fidelity while reducing code size by approximately 95% and improving predicted visual quality. We release the dataset and generation pipeline to support research in layout modeling, code generation, and design research.
2026
979-8-4007-2281-3
File in questo prodotto:
File Dimensione Formato  
3772363.3799359.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 2.93 MB
Formato Adobe PDF
2.93 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3008329