The authors trained over 50 speech language models (SLMs) with different numbers of parameters and data budgets to study their scaling behavior. They found that the test loss of SLMs follows scaling power laws similar to those observed in text-based large language models (LLMs).
The authors established a strong correlation between the test loss of neural language models and their downstream syntactic and semantic performance metrics. This allowed them to model the scaling of linguistic performance for both SLMs and LLMs.
The results show that the linguistic performance of SLMs, including syntactic (BLIMP) and semantic (Topic Cloze, Story Cloze) metrics, scales up to three orders of magnitude more slowly than that of LLMs as compute increases. This suggests that SLMs will require significantly more compute to match the linguistic proficiency of their text-based counterparts.
The authors also explored the use of synthetic data (STINYSTORIES) and coarser speech tokenization to boost the semantic understanding of SLMs. While the synthetic data improved semantic performance, the coarser tokenization was detrimental to downstream performance.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Santiago Cue... at arxiv.org 04-02-2024
https://arxiv.org/pdf/2404.00685.pdfDeeper Inquiries