The researchers developed an automated approach to summarize and analyze privacy policies and terms of service documents. They collected a dataset of over 21,000 annotations from the Terms of Service; Didn't Read (ToS;DR) platform, which provides community-based reviews and summaries of policy documents.
The key highlights of the study are:
They performed multi-class text classification on sentences, with a label space of 246 cases (key concepts) and 5 document types (Terms of Service, Privacy Policy, Cookie Policy, Data Policy, and Other Policy). This allowed them to extract and categorize the key concepts from the policy documents.
They compared the performance of transformer-based models (RoBERTa and PrivBERT) and conventional models (Linear SVM and Random Forest) on the classification tasks. RoBERTa achieved the best overall performance with an F1-score of 0.74 for the case classification task.
Leveraging the best-performing RoBERTa model, the researchers highlighted redundancies and potential GDPR guideline violations by identifying overlaps in the key concepts between privacy policies and terms of service documents.
The analysis revealed that privacy policies are encroaching on content better suited for terms of service, suggesting a lack of clarity in the terminologies and contents between the two document types.
The researchers proposed that their automated approach can help regulators, customers, and authors by objectively quantifying and emphasizing the overlap in policy documents, as well as providing a foundation for developing a practical tool to analyze fresh, unexplored policy data.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Shikha Sonej... at arxiv.org 04-23-2024
https://arxiv.org/pdf/2404.13087.pdfDeeper Inquiries