Core Concepts
Proposing a pipeline for contrastive language-audio pretraining to enhance audio representation by combining audio data with natural language descriptions.
Stats
"LAION-Audio-630K is a large collection of 633,526 audio-text pairs."
"AudioCaps dataset contains about 55K training samples of audio-text pairs."
Quotes
"We release LAION-Audio-630K, currently the largest public audio caption dataset of 633,526 audio-text pairs."
"Our model achieves superior performance in text-to-audio retrieval task."