toplogo
Giriş Yap

Unveiling Terrorizer: Algorithm for Company Name Consolidation in Patents


Temel Kavramlar
Terrorizer algorithm harmonizes company names in patents using NLP and network theory.
Özet
The article introduces the Terrorizer algorithm to address the challenge of disambiguating company names in patents. It leverages NLP, network theory, and rule-based techniques to consolidate variants of company names. The algorithm consists of parsing, matching, filtering stages, and a knowledge augmentation phase. Evaluation on four datasets shows superior performance compared to existing methods. Introduction to the problem of disambiguating company names in patents. Description of the Terrorizer algorithm leveraging NLP and network theory. Validation results showing improved F1 score compared to existing algorithms. Hyperparameter optimization using Optuna framework for maximizing F1 score.
İstatistikler
"Our final result is a reduction in the initial set of names of over 42%." "We use Terrorizer on a set of 325’917 companies’ names who are assignees of patents granted by the USPTO from 2005 to 2022."
Alıntılar
"We use Terrorizer on a set of 325’917 companies’ names who are assignees of patents granted by the USPTO from 2005 to 2022."

Önemli Bilgiler Şuradan Elde Edildi

by Grazia Sveva... : arxiv.org 03-20-2024

https://arxiv.org/pdf/2403.12083.pdf
Presenting Terrorizer

Daha Derin Sorular

How does the Terrorizer algorithm handle multinational corporations with diverse name variations?

The Terrorizer algorithm handles multinational corporations with diverse name variations by leveraging natural language processing (NLP), network theory, and rule-based techniques. It addresses challenges such as spelling variations, spelling mistakes, business and legal extensions, addition of geographical indications, identification of subsidiaries and acronyms commonly found in company names. Parsing Phase: The algorithm starts by parsing the company names and augmenting the information through web searches to correct spelling mistakes and gather additional data. Matching Phase: For each pair of names, a matching score is calculated based on conditions like common tokens, domain similarity, text from URLs, and cosine similarity between the names. Filtering Phase: Community detection is used to identify groups of related names within the network created in the matching phase. False positives are removed using bridgeness centrality to eliminate edges connecting different communities. In handling multinational corporations specifically: The algorithm considers shared locations where companies have transacted patents to strengthen connections between related entities. It uses community detection to group together various name variants that belong to the same entity across different regions or subsidiaries. By optimizing hyperparameters through Bayesian optimization methods like TPE, it fine-tunes its performance for complex cases involving multinational corporations.

How can entity linking algorithms like Terrorizer impact other fields beyond patent analysis?

Entity linking algorithms like Terrorizer can have significant implications beyond patent analysis: Business Intelligence: In industries outside patent analysis such as market research or competitive intelligence, accurate entity resolution can provide insights into market trends, competitor activities, partnerships etc. Financial Analysis: Improved disambiguation of company names can enhance financial analysis by accurately tracking investments made by companies across different sectors or regions. Healthcare Data Management: In healthcare data management systems where patient records need to be linked accurately for treatment purposes or research studies. Social Media Analytics: Entity linking algorithms can help social media platforms link user profiles accurately for targeted advertising or content personalization based on user behavior patterns. Academic Research: Enhancing author disambiguation in academic publications allows researchers to track citations more effectively and understand collaboration networks better. Overall, entity linking algorithms play a crucial role in organizing unstructured data across various domains leading to improved decision-making processes and deeper insights into complex relationships among entities.

What implications does the reduction in unique assignee names have on patent analysis?

The reduction in unique assignee names resulting from algorithms like Terrorizer has several implications on patent analysis: Improved Accuracy: By consolidating diverse name variations under one entity identifier reduces duplication errors which leads to more accurate analytics results. Enhanced Insights: A cleaner dataset allows for clearer visualization of trends such as innovation hotspots among companies or technology domains which may have been obscured by multiple entries for a single entity. 3 .Efficient Resource Allocation: Researchers spend less time manually cleaning datasets allowing them more time for actual analysis tasks leading to faster decision-making processes 4 .Better Comparisons: With standardized naming conventions across all patents assigned an organization's portfolio becomes easier compared against competitors' portfolios providing valuable benchmarking opportunities 5 .Regulatory Compliance: Standardized naming conventions facilitate regulatory compliance requirements making it easier for organizations operating globally adhere consistently
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star