URLs play an essential role on the Internet, allowing access to Web resources. Automatically generating URLs is helpful in various tasks, such as application debugging, API testing, and blocklist creation for security applications. Current testing suites deeply embed experts’ domain knowledge to generate suitable URLs, resulting in an ad-hoc solution for each given application. These tools thus require heavy manual intervention, with the expensive coding of rules that are hard to maintain. We here introduce URLGEN, a system that uses Generative Adversarial Networks (GANs) to tackle the automatic URL generation problem. URLGEN is designed for web API testing and generates URL samples for an application without any system expertise, complementing the existing tools. It leverages Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) architectures, augmented by an embedding layer that simplifies the URL learning and generation process. We show that URLGEN learns to generate new valid URLs from samples of real URLs without requiring any domain knowledge and following a purely data-driven approach. We compare the GAN architecture of URLGEN against other design options and show that the LSTM architecture can better capture the correlation among URL characters, outperforming previously proposed solutions. Finally, we show that the URLGEN approach can be extended to other scenarios, which we illustrate with two use cases, i.e., cybersquatting domain prediction and URL classification.
URLGEN – Towards Automatic URL Generation Using GANs / Valentim, Rodolfo; Drago, Idilio; Trevisan, Martino; Mellia, Marco. - In: IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT. - ISSN 1932-4537. - ELETTRONICO. - 20:3(2023), pp. 3734-3746. [10.1109/TNSM.2022.3225311]
URLGEN – Towards Automatic URL Generation Using GANs
Valentim, Rodolfo;Drago, Idilio;Trevisan, Martino;Mellia, Marco
2023
Abstract
URLs play an essential role on the Internet, allowing access to Web resources. Automatically generating URLs is helpful in various tasks, such as application debugging, API testing, and blocklist creation for security applications. Current testing suites deeply embed experts’ domain knowledge to generate suitable URLs, resulting in an ad-hoc solution for each given application. These tools thus require heavy manual intervention, with the expensive coding of rules that are hard to maintain. We here introduce URLGEN, a system that uses Generative Adversarial Networks (GANs) to tackle the automatic URL generation problem. URLGEN is designed for web API testing and generates URL samples for an application without any system expertise, complementing the existing tools. It leverages Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) architectures, augmented by an embedding layer that simplifies the URL learning and generation process. We show that URLGEN learns to generate new valid URLs from samples of real URLs without requiring any domain knowledge and following a purely data-driven approach. We compare the GAN architecture of URLGEN against other design options and show that the LSTM architecture can better capture the correlation among URL characters, outperforming previously proposed solutions. Finally, we show that the URLGEN approach can be extended to other scenarios, which we illustrate with two use cases, i.e., cybersquatting domain prediction and URL classification.File | Dimensione | Formato | |
---|---|---|---|
TNSM3225311.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
1.77 MB
Formato
Adobe PDF
|
1.77 MB | Adobe PDF | Visualizza/Apri |
Valentim-URLGEN.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
3.18 MB
Formato
Adobe PDF
|
3.18 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2973517