The paper proposes a Deep Space Separable Distillation Network (DSSDN) for efficient acoustic scene classification. The key contributions are:
Frequency axis cutting: The network performs high-low frequency decomposition on the log-mel spectrogram, reducing computational complexity while maintaining model performance.
Lightweight operators: Three new lightweight operators are designed - Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit efficient feature extraction capabilities for acoustic scene classification tasks.
Network architecture: The DSSDN architecture is built using the proposed DSSDB (Deep Space Separable Distillation Block) as the basic module, which stacks the DSSO (Deep Space Separable Operator) blocks. The channel splicing technique is used to fuse information from high-level and low-level networks.
The experiments demonstrate that the proposed DSSDN-Large, DSSDN-Middle, and DSSDN-Small models achieve significant performance gains of 9.8% compared to popular deep learning methods, while also having smaller parameter count and computational complexity.
To Another Language
from source content
arxiv.org
Ключові висновки, отримані з
by ShuQi Ye,Yua... о arxiv.org 05-07-2024
https://arxiv.org/pdf/2405.03567.pdfГлибші Запити