Jiang, X.-H., Ai, Y., Zheng, R.-C., Du, H.-P., Lu, Y.-X., & Ling, Z.-H. (2024). MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios. arXiv preprint arXiv:2411.00464.
This paper introduces MDCTCodec, a new neural audio codec designed to address the challenges of high-quality audio compression at high sampling rates and low bitrates. The authors aim to demonstrate MDCTCodec's superiority over existing codecs in terms of decoded audio quality, training and generation efficiency, and model size.
The MDCTCodec utilizes the Modified Discrete Cosine Transform (MDCT) spectrum as its core coding object. It employs a modified ConvNeXt v2 network for encoding and decoding, coupled with a Residual Vector Quantizer (RVQ) for discretization. A novel Multi-Resolution MDCT-based Discriminator (MR-MDCTD) facilitates adversarial training. The model is evaluated on the VCTK dataset using objective metrics like LSD, STOI, ViSQOL, RTF, training time, and model size, as well as subjective ABX preference tests.
MDCTCodec presents a compelling solution for high-quality audio compression in high sampling rate and low bitrate scenarios. Its efficiency, lightweight nature, and superior performance make it a promising candidate for various applications, including speech large models.
This research significantly advances the field of neural audio codecs by introducing a novel architecture and training strategy that effectively tackles the challenges of high-fidelity audio compression at low bitrates. The lightweight design and efficiency of MDCTCodec hold significant implications for its practical deployment in real-world applications.
While MDCTCodec demonstrates impressive performance, further research could explore its application in lower latency scenarios and its integration with downstream tasks like speech synthesis and speech recognition.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Xiao-Hang Ji... at arxiv.org 11-04-2024
https://arxiv.org/pdf/2411.00464.pdfDeeper Inquiries