OpenAI introduces Sora, a text-to-video model capable of generating minute-long high-definition videos from text prompts, emphasizing its potential and limitations.
OpenAI introduces Sora as a foundation for models to understand and simulate the real world, aiming towards Artificial General Intelligence.
I4VGEN is a novel inference pipeline that improves the quality of pre-trained text-to-video diffusion models by incorporating image information during inference, eliminating the need for additional training and addressing the non-zero terminal SNR issue.
本稿では、事前学習済みのテキスト-to-ビデオ拡散モデルに対し、追加学習なしに画像情報を活用して品質向上を実現する新しいビデオ拡散推論パイプライン「I4VGEN」を提案する。
I4VGen은 사전 훈련된 텍스트-비디오 확산 모델을 향상시키기 위해 이미지 정보를 활용하는 새로운 비디오 확산 추론 파이프라인으로, 추가적인 훈련 없이도 시각적 사실성과 텍스트 충실도가 향상된 비디오를 생성합니다.
CogVideoX introduces a novel approach to text-to-video generation, leveraging diffusion transformers, a 3D Variational Autoencoder (VAE), and an expert transformer to produce high-resolution, long-duration videos with coherent narratives and realistic motion.
MotionAura is a novel framework that leverages 3D vector-quantized diffusion models and spectral transformers to generate high-quality, temporally consistent videos from text prompts and guide video inpainting using sketches.
MotionAura는 고품질의 시간적 일관성을 가진 비디오를 생성하기 위해 새로운 3D VAE 아키텍처와 스펙트럼 변환기를 활용한 텍스트-비디오 생성 모델입니다.
MotionAura 是一種基於新型 3D VAE 和頻譜變換器的新型文字轉影片生成框架,能夠生成具有逼真時間一致性且與輸入文字提示一致的高品質影片。
HARIVO is a novel single-stage method for generating diverse and high-quality videos from text prompts by leveraging the power of frozen, pre-trained text-to-image diffusion models and incorporating innovative architectural designs and loss functions to ensure temporal consistency.