Основні поняття
A method to improve automatic speech recognition (ASR) performance for specialized terminology by utilizing word frequency differences between normal contexts and lecture contexts, as determined through optical character recognition (OCR) and analysis.
Анотація
The content discusses a method to enhance the performance of automatic speech recognition (ASR) systems, particularly for recognizing specialized terminology in lecture audio. The key aspects are:
Defining three metrics to analyze word frequencies:
Normal Frequency (NF): The frequency of a word in general contexts, using the Google Web Trillion Word Frequency Dataset.
Lecture Frequency (LF): The frequency of a word in a lecture context, calculated as the count of the word among all words extracted via OCR, divided by the total number of words.
Relative Frequency (RF): The ratio of LF to NF, indicating how much more frequently a word appears in lectures compared to general contexts.
Improving the original method proposed in Jung's previous research:
Method 1: When calculating NF, if a word extracted by OCR is not found in the Large Text Dataset (LTD), its count is replaced with the minimum count value in the OCR dataset, rather than setting it to zero.
Method 2: All RF values less than 1 are replaced with 1 to ensure the RF data follows the power law.
Experiments and data analysis:
The existing method was found to have drawbacks, as it uniformly assigned high RF values to words not found in the LTD, reducing the reliability and accuracy of the RF values.
The improved methods, particularly Method 1, were shown to enhance the RF values and better align with the power law, providing a stronger theoretical foundation for the approach.
The core idea is to leverage the differences in word frequencies between general contexts and lecture contexts, as determined through OCR, to improve the performance of ASR systems in recognizing specialized terminology.
Статистика
The content does not provide any specific numerical data or metrics to extract. The focus is on the theoretical foundations and methodological improvements of the proposed approach.
Цитати
The content does not contain any direct quotes that are particularly striking or supportive of the key logics.