Spoken language is an information-rich medium that combines words with various paralinguistic information such as emotion and prosody. In discourse, this allows for maintaining a human element that is lacking in many other channels, such as writing or social media. However, voice is a distinct biomarker, and there are many situations in which a speaker may want to hide their identity, such as if they are sharing sensitive content or want to protect personal information such as geographical background or ethnicity.
In this project, we develop a system for voice anonymization using a voice conversion (VC) approach, in which we convert the vocal identity of an utterance to sound like another person without changing the linguistic or prosodic content. Using a state-of-the-art deep neural network VC model, we are able to transform any speech utterance to sound like any target speaker given a sample of the target speaker’s speech. We further explore how listening to speech anonymized in this way affects peoples’ perception of the content that is conveyed, both from the point of view of the listener and the original speaker.