Evaluating the Effectiveness of Vocabulary Trimming Techniques for Improving Inference Efficiency in Large Language Models
Vocabulary trimming techniques based on language heuristics can reduce the memory usage of small language models by up to 50% and improve generation speed by up to 25%, but their effectiveness diminishes for larger models and certain languages.