toplogo
Sign In

Comparative Analysis of Patent Embedding Models for Similarity Calculation


Core Concepts
Patent SBERT-adapt-ub outperforms in patent similarity, highlighting the importance of training phase in embedding models.
Abstract
This paper compares static and contextual embedding models for patent similarity calculation. It introduces a new model, Patent SBERT-adapt-ub, which excels in performance. The study uses patent interferences as a benchmark dataset. Results show that contextual embeddings may not always outperform static ones due to training phase impact. Abstract: Compares performance of static and contextual embeddings. Introduces Patent SBERT-adapt-ub with superior results. Utilizes patent interferences as benchmark dataset. Introduction: Patents link to natural language processing (NLP). Growing research on information retrieval from patents. Textual similarity crucial for innovation understanding. Related Work: Evolution from keyword-based approaches to neural networks. Various models used for patent classification and similarity. Comparison of different architectures for patent analysis. Data: Creation of triplets dataset using Patents View data. Dataset about patent interferences used as ground-truth benchmark. Models and Experiments: Comparison of Word2vec TF-IDF, Doc2vec, and SBERT models. Introduction of two original SBERT models trained on triplets dataset. Analysis: Evaluation of models' performance on patent similarity task. Conclusion: Contextual model performs best but not always superior to static embeddings. Domain adaptation enhances model performance significantly.
Stats
Patent SBERT-adapt-ub outperforms the current state-of-the-art in patent similarity calculation. Large static models can be comparable to contextual ones when trained extensively. Sentence Transformers' domain adaptation leads to the best performance.
Quotes
"Patents provide their owners with a legal right to exclude others from making, using, selling..." "Textual similarity among patents is essential for mapping innovation patterns."

Key Insights Distilled From

by Grazia Sveva... at arxiv.org 03-26-2024

https://arxiv.org/pdf/2403.16630.pdf
A comparative analysis of embedding models for patent similarity

Deeper Inquiries

How can the findings of this study be applied practically in industries dealing with patents?

The findings of this study provide valuable insights into the effectiveness of different embedding models for calculating patent similarity. Industries dealing with patents can apply these findings to enhance their patent analytics processes. By using advanced NLP techniques like static and contextual embeddings, companies can improve technology mapping, innovation pattern analysis, and patent quality evaluation. The domain adaptation of models like SBERT demonstrated superior performance in measuring patent similarity, indicating its potential application in automating tasks related to patent classification and infringement detection.

Could there be biases or limitations in using domain-specific word embeddings like those discussed here?

While domain-specific word embeddings offer significant advantages in capturing technical nuances and jargon present in patents, they are not without biases or limitations. One potential limitation is the need for extensive training data to ensure accurate representations, which may not always be readily available for niche domains. Biases could also arise from the training data itself if it reflects certain patterns or perspectives inherent in the dataset. Additionally, contextually trained models might struggle with polysemy or ambiguity common in patent texts, leading to challenges in accurately representing complex technical concepts.

How might advancements in NLP technology impact future research on textual similarities in patents?

Advancements in NLP technology are poised to revolutionize future research on textual similarities in patents by enabling more sophisticated analyses and applications. Improved language models and embedding techniques will enhance the accuracy and efficiency of measuring patent similarity, facilitating faster identification of relevant prior art during examination processes. Furthermore, developments such as fine-tuning pre-trained models on domain-specific datasets will enable researchers to create specialized tools tailored specifically for analyzing patents across various technological fields. Overall, advancements in NLP technology will drive innovation within the intellectual property landscape by streamlining workflows and enhancing decision-making processes related to patents.
0