Mediapipe based Preprocessed VGGFace2 Dataset

Shah, Syed Taimoor Hussain; Shah, Syed Adil Hussain; Zamir, Ammara; Qayyum, Kainat; Shah, Syed Baqir Hussain; Fatima, Syeda Maryam; Deriu, Marco Agostino

doi:10.5281/ZENODO.15078557

VGGFace2 Dataset and Face Mesh PreprocessingIntroductionThe VGGFace2 dataset is a large-scale face recognition dataset containing over 3.31 million images of 9,131 identities, with an average of 362 images per identity. The dataset is designed to include extensive variations in pose, age, illumination, ethnicity, and profession, making it one of the most diverse and challenging face recognition datasets available. For more details, please refer to the original publication:VGGFace2: A dataset for recognizing faces across pose and age - DOI: 10.48550/arXiv.1710.08092 Preprocessing Using MediaPipe 3D Face MeshOn this dataset, we applied the MediaPipe-based 3D face mesh algorithm to accurately detect faces while removing all background elements, including hair. Our preprocessing strictly retained facial landmarks, ensuring that only the essential facial features were preserved. This approach significantly enhanced the accuracy and generalization of our model, as the model was trained exclusively on landmark-based facial data. Training and PerformanceThe preprocessed data was utilized to train Xception model, which resulted in remarkably accurate outcomes due to the strictly landmark-based facial representation. The model demonstrated robust performance including explainable-AI, proving that eliminating unnecessary background elements contributed positively to its efficiency and reliability. CitationIf you use this dataset or the preprocessed version in your work, please cite both of the following: VGGFace2 Dataset: @article{Cao2018VGGFace2, title={VGGFace2: A dataset for recognizing faces across pose and age}, author={Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M and Zisserman, Andrew}, journal={arXiv preprint arXiv:1710.08092}, year={2018}} DOI: [10.48550/arXiv.1710.08092](https://doi.org/10.48550/arXiv.1710.08092) Preprocessed Dataset using MediaPipe:@dataset{Shah2025_MediaPipe_FaceMesh, title={MediaPipe-based 3D Face Mesh Preprocessed VGGFace2 Dataset}, author={Shah, Syed Taimoor Hussain and Shah, Syed Adil Hussain and Zamir, Ammara and Qayyum, Kainat and Shah, Syed Baqir Hussain and Fatima, Syeda Maryam and Deriu, Marco Agostino}, year={2025}, doi={10.5281/zenodo.15078557}} DOI: [10.5281/zenodo.15078557](https://doi.org/10.5281/zenodo.15078557) ContactFor any questions or further details, please feel free to contact us.Syed Taimoor Hussain ShahPolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, ItalyEmail: taimoor.shah@polito.itORCID: 0000-0002-6010-6777

PORTO @ Archivio Istituzionale della Ricerca