Echotune: A Modular Extractor Leveraging Variable-Length Speech Features for Improved Automatic Speech Recognition
Echo Multi-Scale Attention (Echo-MSA) is introduced, a module that enhances the accuracy of representing variable-length speech features in automatic speech recognition tasks by using dynamic attention mechanisms adaptable to different speech complexities and durations.