Główne pojęcia
Large Language Models enhance machine learning for traffic incident severity classification.
Streszczenie
The study evaluates the impact of Large Language Models on improving machine learning processes for managing traffic incidents. It explores the use of language models to extract features from accident reports and their effectiveness in predicting severity levels. The research compares different combinations of language models and machine learning algorithms, highlighting the benefits of incorporating features from language models with traditional data. The study showcases the potential of integrating language processing capabilities with traditional data to enhance machine learning pipelines in classifying incident severity.
I. Abstract:
- Evaluates impact of Large Language Models on enhancing machine learning processes for managing traffic incidents.
- Compares combinations of language models and machine learning algorithms.
- Demonstrates benefits of incorporating features from language models with traditional data.
II. Introduction:
- Rise in vehicular traffic leads to increased accidents, necessitating effective Traffic Incident Management Systems (TIMS).
- Classifying accident severity is crucial but challenging due to stochastic nature.
- Large Language Models offer an opportunity to augment conventional machine learning approaches.
III. Methodology:
- Explores combining LLM and ML models with full-text representation for traffic accident modeling.
- Three scenarios evaluated: Baseline Accident Report Features, NLP Features, Combination of Baseline and NLP Features.
IV. Results:
- Performance comparison shows that combining report and language features improves severity classification accuracy.
- XGBoost and RandomForest demonstrate competitive performance.
- Different language models show varying performance across datasets.
V. Case Study & Experiment Setup:
- Evaluation conducted on high-performance computing system with various metrics like total batch processing time, tokenization time, model inference time.
- BERT and ROBERTA models exhibit highest overall speed.
VI. General Comparison for Area: USA, Description only:
- BERT outperforms other LLMs in feature extraction relevant to incident severity classification.
- Random Forest and XGBoost are most effective in utilizing LLM-extracted features for severity classification.
- NLP Features extracted from incident description field prove nearly as effective as Report-only features.
VII. Use of PCA for Dimensionality Reduction:
- Fast ML models like XGBoost used for efficiency in handling high-dimensional data.
- Principal Component Analysis employed for dimensionality reduction to mitigate challenges associated with high dimensionality.
Statystyki
Our primary goal is to investigate the potentials of LLMs in feature extraction from textual accident reports. By ’feature extraction’, we refer to the process of selecting and encoding information from raw accident report data to represent the properties of an accident.
This comparison was quantified using the F1-score over uniformly sampled data sets to obtain balanced severity classes.
The ability to use text representation right away, while achieving acceptable prediction performance, instead of feature engineering (e.g., normalization of values, label encoding, functional feature transformations, date interpretation) is interesting to traffic management authorities and data analysts in transportation.
Cytaty
"The ability of LLMs to understand and process unstructured textual data presents a significant opportunity."
"Our proposed method has cross-domain application potential."