Core Concepts
Large web-crawled multimodal datasets require efficient data filtering to improve visual representation learning.
Stats
T-MARSはImagenetでCLIPフィルタリングを6.5%、VTABで4.7%上回る。
Quotes
"Data curation at web scale raises unique challenges compared to the standard classification regime."
"Our scaling trends show that good-quality data filtering holds even more significance at large scales."