The paper presents a two-stage approach to efficiently identify policyholders who may be using their vehicles for commercial delivery purposes, which is not covered under standard personal auto insurance policies.
The first stage involves developing a supervised machine learning model to classify individual trips as either "delivery" or "non-delivery" based on GPS and accelerometer data. However, due to the high false positive rate and imbalanced nature of the data, the trip-level classifications alone are not sufficient to reliably identify delivery drivers.
The second stage introduces a novel Bayesian mixture model approach to aggregate the trip-level classifications at the policyholder level. This model assumes that policyholders can be divided into two groups - a majority group with a low rate of positive trip classifications, and a minority group with a much higher rate. By learning the parameters of this mixture model using Markov Chain Monte Carlo (MCMC) inference, the approach is able to assign a posterior probability of a policyholder belonging to the minority (delivery driving) group, given the observed trip classifications for that policyholder.
This posterior probability is then converted to a priority score, which is used to rank policyholders for manual investigation by underwriters. Over a 1-year trial period, the top 0.9% of policyholders identified by the model were reviewed, and 99.4% of them were confirmed to be correctly identified as engaging in delivery driving activities. This represents a significant improvement in the efficiency of human resource allocation compared to manual searching.
The paper also discusses potential future improvements, such as incorporating newly acquired labeled data to refine the model, and exploring extensions to detect other types of driving behavior of interest to telematics insurers.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Mark McLeod,... at arxiv.org 04-23-2024
https://arxiv.org/pdf/2404.14276.pdfDeeper Inquiries