Core Concepts
Patent SBERT-adapt-ub outperforms in patent similarity, highlighting the importance of training phase in embedding models.
Abstract
This paper compares static and contextual embedding models for patent similarity calculation. It introduces a new model, Patent SBERT-adapt-ub, which excels in performance. The study uses patent interferences as a benchmark dataset. Results show that contextual embeddings may not always outperform static ones due to training phase impact.
Abstract:
Compares performance of static and contextual embeddings.
Introduces Patent SBERT-adapt-ub with superior results.
Utilizes patent interferences as benchmark dataset.
Introduction:
Patents link to natural language processing (NLP).
Growing research on information retrieval from patents.
Textual similarity crucial for innovation understanding.
Related Work:
Evolution from keyword-based approaches to neural networks.
Various models used for patent classification and similarity.
Comparison of different architectures for patent analysis.
Data:
Creation of triplets dataset using Patents View data.
Dataset about patent interferences used as ground-truth benchmark.
Models and Experiments:
Comparison of Word2vec TF-IDF, Doc2vec, and SBERT models.
Introduction of two original SBERT models trained on triplets dataset.
Analysis:
Evaluation of models' performance on patent similarity task.
Conclusion:
Contextual model performs best but not always superior to static embeddings.
Domain adaptation enhances model performance significantly.
Stats
Patent SBERT-adapt-ub outperforms the current state-of-the-art in patent similarity calculation.
Large static models can be comparable to contextual ones when trained extensively.
Sentence Transformers' domain adaptation leads to the best performance.
Quotes
"Patents provide their owners with a legal right to exclude others from making, using, selling..."
"Textual similarity among patents is essential for mapping innovation patterns."