Text-to-video generation

Entrar

insight - Text-to-video generation

사용자 주도의 카메라 움직임과 객체 움직임을 통한 맞춤형 동영상 생성

본 연구는 사용자가 독립적으로 카메라 움직임과 객체 움직임을 제어할 수 있는 텍스트 기반 동영상 생성 프레임워크를 제안한다.

Enhancing Text-to-Video Generation with Swapped Spatiotemporal Attention

The core message of this paper is that strengthening the interaction between spatial and temporal features is crucial for achieving high-quality text-to-video generation. The authors propose a novel Swapped spatiotemporal Cross-Attention (Swap-CA) mechanism that alternates the "query" role between spatial and temporal blocks, enabling mutual reinforcement for each other.

Unlocking Zero-Shot Video Editing with Cross-Attention Guidance in Text-to-Video Diffusion Models

Cross-attention guidance can enable zero-shot control over object shape, position, and movement in text-to-video diffusion models, despite the limitations of current models.

Efficient Text-to-Video Generation with Grid Diffusion Models

Our novel grid diffusion models efficiently generate high-quality videos from text by reducing the temporal dimension of videos to the image dimension, enabling the use of various image-based methods for video tasks.

Sobre

Produtos

Recursos