DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

1Li Auto 2Harbin Institute of Technology 3University of New South Wales 4University of Science and Technology of China
Paper Video Poster Data

Abstract

Human-centric generative models are becoming increasingly popular, giving rise to various innovative tools and applications, such as talking face videos conditioned on text or audio prompts. The core of these capabilities lies in powerful pretrained foundation models, trained on large-scale, high-quality datasets. However, many advanced methods rely on in-house data subject to various constraints, and other current studies fail to generate high-resolution face videos, which is mainly attributed to the significant lack of large-scale, high-quality face video datasets. In this paper, we introduce a human face video dataset, DH-FaceVid-1K. Our collection spans 1200 hours in total, encompassing 270,043 video samples from over 20,000 individuals. Each sample includes corresponding speech audio, facial keypoints, and text annotations. Compared to other publicly available datasets, ours distinguishes itself through its multi-ethnic coverage and high-quality comprehensive individual attributes. We establish multiple face video generation models supporting tasks such as Text-to-Video and Image-to-Video generation. In addition, we develop comprehensive benchmarks to validate the scaling law when using different proportions of our dataset. Our primary aim is to contribute a face video dataset, particularly addressing the underrepresentation of Asian faces in existing curated datasets and thereby enriching the global spectrum of face-centric data and mitigating demographic biases.

TL;DR: We introduce DH-FaceVid-1K, a large-scale, high-quality multi-ethnic face video dataset with comprehensive attributes, enabling diverse generation tasks and benchmarks while mitigating demographic biases.

Dataset Overview

Responsive image

Overview of DH-FaceVid-1K Dataset}. It consists of 270,043 video clips along with corresponding spoken audio and annotations, featuring more than 20,000 unique identities and over 1,200 hours of facial video footage captured under various environmental conditions and lighting scenarios. Notably, 83% of the dataset represents Asian individuals, addressing the significant shortage of open-source Asian face video datasets.

Download DH-FaceVid-1K

Click me for application
220k samples / 1.2 khrs duration / ~4.01 TB

🔥 Please Note

These video samples are sourced from crowd-sourcing platforms. To prevent data misuse, users must adhere to the relevant licensing agreements to access these video data. Our licensing agreement can be found here.

To prevent misuse of the DH-FaceVid-1K dataset, we require you to submit information for review and approval before granting download access. Please fill out the form here.

Once approved, we will send you download instructions within 1-2 days. After receiving the download instruction email, you can click the download link in the email and follow the instructions on the page to complete the dataset download process. If you encounter any issues during the download or do not receive the email within a reasonable time, please contact us at our email address fenghe021209@gmail.com.

Face Video Datasets Comparison

Compared with other datasets, FaceVid-1K has a larger data volume, competitive quality, and richer attribute annotations.

Comparison

Statics

Distributions of general appearances, hair colors, emotions, actions, ethnicity, and age.

Statics

Collection Pipe

Collection Pipe

Comprehensive Attribute List

Comprehensive attribute list of DH-FaceVid-1K, including ethnicities, appearance details, emotions, actions, and lighting conditions.

Comprehensive Attribute List

Sampled Videos

Please note that to ensure smooth page loading, we have resized all videos to 256*256.
Diverse and high-quality Asian face videos.

Multi-ethnic face videos.

Face videos covering a wide range of age distributions.

Face videos covering various head poses.

Face videos covering various emotions.

BibTeX

@inproceedings{Di2024FaceVid1KAL,
      title={FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset},
      author={Donglin Di and He Feng and Wenzhang Sun and Yongjia Ma and Hao Li and Wei Chen and Xiaofei Gou and Tonghua Su and Xun Yang},
      year={2024},
      url={https://api.semanticscholar.org/CorpusID:273233717}
    }

Homepage Template

If you want to use this fully-responsive and easy-to-adapt homepage template, you can download it from the github repository.