The authors present a novel NLP system for the automatic detection of relevant financial events in unstructured textual sources. The system comprises the following key steps:
Multi-paragraph topic segmentation: The TextTiling algorithm is used to group closely related text into coherent segments.
Co-reference resolution: The Neural Network by Clark et al. (2016) is employed to replace references with meaningful words, improving the performance of the subsequent LDA stage.
Tag processing: Financial terms are detected, homogenized and replaced with appropriate tags to prepare the input for the LDA stage.
Relevant text detection with LDA topic modelling: LDA is used to differentiate between relevant and less relevant information in the text segments. A topic score ρ is defined to represent the density of significant tags, and an LDA score threshold δ is introduced to improve the precision of the LDA algorithm in detecting relevant text.
Temporal analysis: Dependency and proximity analyses are performed to extract temporal features, which are then used to train a Linear Support Vector Classifier (SVC) to estimate the temporality (past, present, future) of a segment.
The proposed solution was evaluated on a data set of 2,158 financial news items manually labeled by NLP researchers. The results show that the system outperformed a rule-based baseline, with ROUGE-L values of 0.662 and 0.982 for the identification of relevant text and predictions/forecasts, respectively.
翻譯成其他語言
從原文內容
arxiv.org
深入探究