Core Concepts
Large Language Models can be effectively leveraged to extract sentiment factors from Chinese financial news texts, which can then be used to inform and enhance quantitative trading strategies in the Chinese stock market.
Abstract
The researchers propose a comprehensive benchmark and standardized back-testing framework to objectively evaluate the efficacy of various Large Language Models (LLMs) in extracting sentiment factors from Chinese financial news texts. They apply three distinct LLMs - a generative model (ChatGPT), a Chinese language-specific pre-trained model (Erlangshen-RoBERTa), and a financial domain-specific fine-tuned model (Chinese FinBERT) - to sentiment extraction from a large dataset of 394,426 Chinese news summaries covering 5,021 publicly traded companies.
The researchers then construct investment portfolios and run stock trading simulation back-tests based on the derived sentiment factors, evaluating the performance using metrics such as annual excess return, risk-adjusted return, and win rate. The results show that the Erlangshen sentiment factor, derived from the Erlangshen-110M-Sentiment model, outperforms the other factors across all metrics, demonstrating a strong correlation between the Erlangshen sentiment factor values and portfolio excess returns.
These findings highlight the importance of language-specific considerations and targeted methodologies when applying LLMs to sentiment factor extraction in Chinese financial texts. The researchers demonstrate that a comparatively smaller LLM, with strategic and extensive pre-training tailored to the Chinese language, can achieve superior performance within the benchmark, emphasizing the significance of adapting LLMs to language nuances rather than relying solely on model size.
Stats
"The company continues to promote the upgrading of its traditional filter business product structure, demonstrating its resilience in a fiercely competitive market. Looking ahead, we are optimistic about the company's solid foundation in the mobile optics business and expect forward-looking layouts such as HUD and AR to open up a second growth curve."
"The rapid advancement of Large Language Models (LLMs) has led to extensive discourse regarding their potential to boost the return of quantitative stock trading strategies."
Quotes
"To ensure successful implementations of these LLMs into the analysis of Chinese financial texts and the subsequent trading strategy development within the Chinese stock market, we provide a rigorous and encompassing benchmark as well as a standardized back-testing framework aiming at objectively assessing the efficacy of various types of LLMs in the specialized domain of sentiment factor extraction from Chinese news text data."
"By constructing such a comparative analysis, we invoke the question of what constitutes the most important element for improving a LLM's performance on extracting sentiment factors."