Core Concepts
MagLive leverages the distinct magnetic field patterns generated by human speech and loudspeakers to effectively detect voice liveness on smartphones, achieving high accuracy and robustness against various spoofing attacks.
Abstract
The paper introduces MagLive, a novel voice liveness detection system for smartphones that leverages near-field magnetic sensing. The key idea is to discern the distinctive variations in magnetic fields generated by human speech versus loudspeakers.
The paper first provides background on the magnetic effects of speakers and presents motivating examples showing the unique magnetic signatures of human voices versus loudspeakers. It then outlines the MagLive system, which comprises four modules: data capture, data preprocessing, feature extraction, and authentication.
The data capture module collects voice and magnetometer data simultaneously, with sound source distance detection to account for magnetic signal attenuation. The data preprocessing module denoises the magnetometer data and segments it using voice cues. The feature extraction module employs CNN-based submodels and a self-attention mechanism to derive robust magnetic field patterns, further enhanced through supervised contrastive learning for user, device, and content irrelevance. Finally, the authentication module uses a binary classifier to distinguish human versus non-human voice samples.
Comprehensive experiments demonstrate MagLive's effectiveness, achieving a balanced accuracy of 99.01% and an equal error rate of 0.77% in detecting various spoofing attacks, including replay, speech synthesis, and voice conversion. MagLive also exhibits strong robustness across different users, devices, voice content, environments, and user postures, making it a practical and secure voice liveness detection solution for smartphones.
Stats
The magnetic field changes caused by loudspeakers are distinct from those produced by human speech.
Magnetic field variations differ not just between humans and loudspeakers, but also among different individuals and spoofing devices.
Quotes
"MagLive leverages differences in magnetic field patterns generated by different speakers (i.e., humans or loudspeakers) when speaking for liveness detection."
"MagLive features minimal operational constraints and maintains its effectiveness across diverse environmental conditions, setting a new standard in voice liveness detection technology."