Core Concepts
Terrorizer algorithm harmonizes company names in patents using NLP and network theory.
Abstract
The article introduces the Terrorizer algorithm to address the challenge of disambiguating company names in patents. It leverages NLP, network theory, and rule-based techniques to consolidate variants of company names. The algorithm consists of parsing, matching, filtering stages, and a knowledge augmentation phase. Evaluation on four datasets shows superior performance compared to existing methods.
Introduction to the problem of disambiguating company names in patents.
Description of the Terrorizer algorithm leveraging NLP and network theory.
Validation results showing improved F1 score compared to existing algorithms.
Hyperparameter optimization using Optuna framework for maximizing F1 score.
Stats
"Our final result is a reduction in the initial set of names of over 42%."
"We use Terrorizer on a set of 325’917 companies’ names who are assignees of patents granted by the USPTO from 2005 to 2022."
Quotes
"We use Terrorizer on a set of 325’917 companies’ names who are assignees of patents granted by the USPTO from 2005 to 2022."