A Legal Framework for Natural Language Processing Model Training in Portugal
Stats
260 million people speak Portuguese as their official language across five continents.
Portuguese is considered a mid-resourced language, with a large amount of unlabeled data but a lesser amount of labeled data.
The majority of Portuguese NLP resources are produced by Brazilian research teams.
Until recently, BERTimbau was the only Portuguese large language model (LLM). More complex architectures have since emerged, such as Albertina PT, Sabiá, Gervásio, and Glória.
Quotes
"The pace at which new LLMs are currently being developed largely surpasses the pace at which new regulations are introduced."
"The capabilities revealed by SOTA NLP models were accompanied by ethical and legal concerns among prominent NLP researchers, big-tech CEOs, politicians, and economists who appealed for regulation and higher ethical standards during the development of NLP solutions."
"The novelty of this subject translates into a lack of literature about the topic. The absence would be even worse if we focused exclusively on the Portuguese legal landscape."