Register for an account

X

Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.

X

Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.

Technology

To Help Computers Detect Who's Talking, These Scientists Figured Out How Humans Do It

D-briefBy Yuen YiuJanuary 26, 2019 5:13 AM
shutterstock_1100778086.jpg

Newsletter

Sign up for our email newsletter for the latest science news

Humans can easily pick out one voice from many. (Credit: Aaron Ama/Shutterstock) (Inside Science) -- If your phone rings and you answer it without looking at the caller ID, it's quite possible that before the person from the other end finishes saying “hello,” you would already know that it was your mother. You could also tell within a second whether she was happy, sad, angry or concerned. Humans can naturally recognize and identify other humans by their voices. A new study published in The Journal of the Acoustical Society of America explored how exactly humans are able to do this. The results may help researchers design more efficient voice recognition software in the future.

The Complexity of Speech

“It's a crazy problem for our auditory system to solve -- to figure out how many sounds there are, what they are and where they are,” said Tyler Perrachione, a neuroscientist and linguist from Boston University not involved in the study. Nowadays, Facebook has little trouble identifying faces in photos, even when a face is presented from different angles or under different lights. Today’s voice recognition software is much more limited in comparison, according to Perrachione, and that may be related to our lack of understanding about how humans are able to identify voices. “We humans have different speaker models for different individuals,” said Neeraj Sharma, a psychologist from Carnegie Mellon University in Pittsburgh and the lead author of the recent study. “When you listen to a conversation, you switch between different models in your brain, so you can understand each speaker better.” People develop speaker models in their brains as they are exposed to different voices, taking into account subtle differences in features such as cadence and timbre. By naturally switching and adapting between different speaker models based on who’s talking, people learn to identify and understand different speakers. “Right now, voice recognition systems don't focus on the speaker aspect -- they basically use the same speaker model to analyze everything,” said Sharma. “For example, when you talk to Alexa, she uses the same speaker model to analyze my speech versus your speech.” So let’s say you have a rather thick Alabamian accent -- Alexa may think that you are saying “cane” when you are trying to say “can’t.” “If we can understand how humans use speaker-dependent models, then maybe we can teach a machine system to do it,” said Sharma.

Listen and Say ‘When’

In the new study, Sharma and his colleagues designed an experiment in which a group of human volunteers listened to audio clips of two similar voices speaking in turn, and were asked to identify the exact moment one speaker took over from the previous one. This allowed the researchers to explore the relationship between certain audio features and the reaction time and false alarm rate of the human volunteers. They then began to decipher what cues humans listen for to indicate a speaker change. “Currently, we don't have a lot of different experiments that allow us to study talker identification or voice recognition, so this experiment design is actually quite clever,” said Perrachione. When the researchers ran the same test for several different types of state-of-the-art voice recognition software, including one commercially available software developed by IBM, they found that the human volunteers performed consistently better than all of the tested software, as expected. Sharma said that they are planning to look at the brain activity of people listening to different voices using electroencephalography, or EEG, a noninvasive method for monitoring brain activities. “That may help us to further analyze how the brain responds when there is a speaker change,” he said. [This story was originally published on Inside Science.]

2 Free Articles Left

Want it all? Get unlimited access when you subscribe.

Subscribe

Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%

Subscribe

Already a subscriber? Register or Log In