Conceitos Básicos
WMCodec is an end-to-end neural speech codec that jointly optimizes compression-reconstruction and watermark embedding-extraction, enabling robust authenticity verification through deep cross-modal feature integration.
Resumo
The paper proposes WMCodec, a novel neural speech codec that addresses the limitations of previous approaches to embedding numerical watermarks for authenticity verification.
Key highlights:
- WMCodec integrates the watermark embedding and extraction processes into the end-to-end training of the speech codec, mitigating the adverse effects of codec compression on the watermark.
- The paper introduces an Attention Imprint Unit (AIU) that leverages cross-attention to enable deeper fusion of watermark and speech features, improving the accuracy and capacity of watermark extraction.
- Experiments on the LibriTTS dataset show that WMCodec outperforms strong baselines like AudioSeal with Encodec and reinforced TraceableSpeech in both watermark imperceptibility and extraction accuracy, especially at lower bitrates.
- At a bandwidth of 6 kbps with a watermark capacity of 16 bps, WMCodec maintains over 99% extraction accuracy under common attacks, demonstrating its robustness and practicality.
Estatísticas
At a bandwidth of 3 kbps, WMCodec 4@16 achieves a PESQ score of 2.606, STOI of 0.898, and MOS of 4.152 ± 0.20, outperforming AudioSeal with Encodec.
At a bandwidth of 6 kbps, WMCodec 4@16 achieves a PESQ score of 3.187, STOI of 0.936, and MOS of 4.434 ± 0.15, outperforming both AudioSeal with Encodec and reinforced TraceableSpeech.
At a bandwidth of 12 kbps, WMCodec 4@16 achieves a PESQ score of 3.558, STOI of 0.953, and MOS of 4.535 ± 0.12, outperforming AudioSeal with Encodec.
Citações
"WMCodec is the first neural speech codec to jointly train compression-reconstruction and watermark embedding-extraction in an end-to-end manner, optimizing both imperceptibility and extractability of the watermark."
"We design an iterative Attention Imprint Unit (AIU) for deeper feature integration of watermark and speech, reducing the impact of quantization noise on the watermark."