المفاهيم الأساسية
Proposing ReGenNet as a benchmark for human action-reaction synthesis, focusing on asymmetric, dynamic, synchronous, and detailed human interactions.
الملخص
The content introduces ReGenNet as a benchmark for human action-reaction synthesis. It addresses the lack of exploration in dynamic human interactions and proposes a model to generate reactions based on given actions. The paper discusses the challenges of modeling human-human interactions and presents datasets with actor-reactor annotations. The proposed ReGenNet model utilizes a diffusion-based approach with Transformer decoder architecture and explicit interaction loss to predict realistic human reactions. Extensive experiments demonstrate the model's ability to generate instant and plausible reactions, even without knowledge of the actor's intentions. The study also evaluates generalization to unseen actor motions and viewpoint changes.
Directory:
- Introduction
- Current focus on generative models for digital humans interacting with environments.
- Lack of exploration in dynamic human-human interactions.
- Human Action-Reaction Synthesis
- Challenges in modeling asymmetric, dynamic, synchronous, and detailed human interactions.
- Proposal of ReGenNet for generating reactions conditioned on actions.
- Experiment
- Evaluation on NTU120-AS, InterHuman-AS, Chi3D-AS datasets.
- Comparison with state-of-the-art models in online setting.
- Generalization Experiments
- Evaluation of model's generalization to viewpoint changes.
- Ablation Study
- Analysis of different module designs, loss components, number of decoder layers, DDIM sampling timesteps.
- Extension to other settings
- Evaluation in offline and constrained settings.
- Qualitative evaluation
- Visualization of generated human reactions from Chi3D-AS and NTU120-AS datasets.
الإحصائيات
8-layer Transformer encoder architecture used for offline setting.
DDIM sampling timestep set at 5 for best FID score with low latency.
اقتباسات
"Extensive experiments show that our method can generate instant and plausible human reactions compared to the baselines."
"Our contributions can be summarized as follows: analyzing asymmetric, dynamic nature of human-human interactions."