Disentangling Speaker Information from Speech Representation using Variable-Length Soft Pooling
The core message of this paper is to remove speaker information from speech representations by exploiting the structured nature of speech and using variable-length soft pooling based on predicted boundaries.