核心概念
The authors introduce the DADIT dataset, containing 30M tweets from 20k Italian Twitter users with demographic labels, to compare prediction methods for gender and age.
摘要
The study introduces the DADIT dataset, highlighting the importance of leveraging tweet content for demographic classification. Various models are compared, with XLM-based classifiers showing significant improvements. The findings emphasize the value of text-rich datasets like DADIT for accurate user classification.
統計資料
DADIT dataset contains 30M tweets from 20k Italian Twitter users.
XLM-based classifier improves upon M3 by up to 53% F1.
Nearly twice as high F1-score achieved by finetuned XLM compared to competitors.