SNAC, a novel neural audio codec employing multi-scale residual vector quantization, achieves superior audio compression efficiency compared to existing codecs, particularly at lower bitrates, by adapting to the inherent hierarchical structure of audio signals.
ESC, a lightweight and parameter-efficient neural speech codec, achieves high audio quality through the integration of cross-scale residual vector quantization and efficient Swin Transformer blocks, outperforming existing state-of-the-art codecs in both reconstruction quality and computational complexity.