Grunnleggende konsepter
Linguistic features impact cross-lingual transfer performance and representation spaces in multilingual models.
Statistikk
"The model has 12 attention heads and 12 transformer blocks with a hidden size of 768."
"The dataset contains 392,702 train, 2,490 validation, and 5,010 test samples."
"Full model fine-tuning on a single language took about 2.5 hours on a single NVIDIA® V100 GPU."
Sitater
"Our findings suggest an inter-correlation between language distance, representation space impact, and transfer performance."
"Selective layer freezing during fine-tuning may help reduce the transfer performance gap to distant languages."