Human-centric generative models are becoming increasingly popular, giving rise to various innovative tools and applications, such as talking face videos conditioned on text or audio prompts. The core of these capabilities lies in powerful pretrained foundation models, trained on large-scale, high-quality datasets. However, many advanced methods rely on in-house data subject to various constraints, and other current studies fail to generate high-resolution face videos, which is mainly attributed to the significant lack of large-scale, high-quality face video datasets. In this paper, we introduce a human face video dataset, DH-FaceVid-1K. Our collection spans 1200 hours in total, encompassing 270,043 video samples from over 20,000 individuals. Each sample includes corresponding speech audio, facial keypoints, and text annotations. Compared to other publicly available datasets, ours distinguishes itself through its multi-ethnic coverage and high-quality comprehensive individual attributes. We establish multiple face video generation models supporting tasks such as Text-to-Video and Image-to-Video generation. In addition, we develop comprehensive benchmarks to validate the scaling law when using different proportions of our dataset. Our primary aim is to contribute a face video dataset, particularly addressing the underrepresentation of Asian faces in existing curated datasets and thereby enriching the global spectrum of face-centric data and mitigating demographic biases.
TL;DR: We introduce DH-FaceVid-1K, a large-scale, high-quality multi-ethnic face video dataset with comprehensive attributes, enabling diverse generation tasks and benchmarks while mitigating demographic biases.
Overview of DH-FaceVid-1K Dataset}. It consists of 270,043 video clips along with corresponding spoken audio and annotations, featuring more than 20,000 unique identities and over 1,200 hours of facial video footage captured under various environmental conditions and lighting scenarios. Notably, 83% of the dataset represents Asian individuals, addressing the significant shortage of open-source Asian face video datasets.
These video samples are sourced from crowd-sourcing platforms. To prevent data misuse, users must adhere to the relevant licensing agreements to access these video data. Our licensing agreement can be found here.
To prevent misuse of the DH-FaceVid-1K dataset, we require you to submit information for review and approval before granting download access. Please fill out the form here.
Once approved, we will send you download instructions within 1-2 days. After receiving the download instruction email, you can click the download link in the email and follow the instructions on the page to complete the dataset download process. If you encounter any issues during the download or do not receive the email within a reasonable time, please contact us at our email address fenghe021209@gmail.com.
Comprehensive attribute list of DH-FaceVid-1K, including ethnicities, appearance details, emotions, actions, and lighting conditions.
@inproceedings{Di2024FaceVid1KAL,
title={FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset},
author={Donglin Di and He Feng and Wenzhang Sun and Yongjia Ma and Hao Li and Wei Chen and Xiaofei Gou and Tonghua Su and Xun Yang},
year={2024},
url={https://api.semanticscholar.org/CorpusID:273233717}
}
If you want to use this fully-responsive and easy-to-adapt homepage template, you can download it from the github repository.