Conceitos Básicos
ConsistentID, an innovative method that maintains identity consistency and captures diverse facial details through multimodal fine-grained prompts, utilizing only a single facial image while ensuring high fidelity.
Resumo
The paper introduces ConsistentID, a novel method for diverse identity-preserving portrait generation under fine-grained multimodal facial prompts, using only a single reference image.
The key components of ConsistentID are:
-
Multimodal Facial Prompt Generator:
- Fine-grained Multimodal Feature Extractor: Combines facial features, corresponding facial descriptions, and overall facial context to enhance precision in facial details.
- Overall Facial ID Feature Extractor: Injects overall identity information into the generation process.
-
ID-Preservation Network:
- Optimized through the facial attention localization strategy to preserve identity consistency in facial regions, preventing the blending of identity information from different facial areas.
To facilitate training, the authors introduce the Fine-Grained ID (FGID) dataset, a comprehensive dataset with over 500,000 facial images and detailed textual descriptions of facial features and regions.
Experimental results demonstrate that ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods on the MyStyle dataset. The method also maintains a fast inference speed during generation despite the introduction of more multimodal fine-grained identity information.
Estatísticas
ConsistentID achieves a CLIP-I score of 76.7, DINO score of 78.5, FaceSim score of 77.2, and FGIS score of 81.4 on the MyStyle dataset.
ConsistentID has an inference speed of 16 seconds, which is faster than other methods like IP-Adapter (13 seconds) and Photomaker (17 seconds).
Citações
"ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions."
"To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets."