Core Concepts
Knowledge augmentation enhances gait video analysis for neurodegenerative diseases.
Abstract
The content discusses a knowledge augmentation strategy for gait video analysis in neurodegenerative diseases using a Vision Language Model. It focuses on improving diagnostic groups and gait impairment assessment through collective learning across different modalities. The method outperforms state-of-the-art models in video-based classification tasks and natural language description decoding.
Structure:
- Abstract: Introduces knowledge augmentation strategy for gait video analysis.
- Introduction: Discusses the limitations of current clinical assessments and the need for video-based analysis.
- Method: Details the approach utilizing three modalities to enhance VLM accuracy.
- Dataset and Preprocessing: Describes the dataset used and preprocessing methods applied.
- VLM Fine-Tuning: Explains how VLM is fine-tuned with visual and knowledge-aware prompts.
- Contrastive Learning: Discusses contrastive learning with numerical text embeddings.
- Experiments and Results: Presents results from classification tests, ablation studies, and comparison with state-of-the-art models.
- Conclusion: Summarizes the findings of the study.
Stats
Our model significantly outperformed other strong SOTA methods with slightly over 100 videos.
The combination of both KAPT and NTE yielded the best performance in ablation studies.