In: Physics
Use Voice Onset Time to explain what it means when we say that we perceive speech sounds categorically? Be sure to talk about discrimination
When someone speaks a language you don't speak, you of course don't understand the meaning of what they say. But did you know that you actually hear different sounds than what a native speaker of that language hears? Take for example the /l/ and /r/ sounds of the English language. While native English speakers easily distinguish these two sounds, Japanese speakers are generally unable to hear the difference. The English words 'lag' and 'rag' would sound the same to them. Likewise, there are sounds that English speakers are unable to distinguish, such as the /t/ and /T/ sounds in Hindi. Why is that? The reason has to do, in part, with a trick of the brain called categorical perception.
Categorical perception helps us to detect the differences between things when we need to be able to, and it masks the differences when we need to treat things as the same. The Japanese language doesn't differentiate between the /l/ and /r/ sounds, so through categorical perception, the brains of native Japanese speakers have learned to treat the two sounds as the same by actually hearing the same sound when each is spoken. However, because those sounds are differentiated in English (changing the l in 'lag' to an r will change the meaning of the word), native English speakers have learned to hear a difference between the two.
Definition of Categorical Perception
Categorical perception occurs when items that range along a continuum are perceived as being either more or less similar to each other than they really are because of the way they are categorized. For example, if items falling within a certain range along that continuum belong to a single category, they will be perceived as being more similar to each other than items outside of that range.
Categorical Perception of Speech Sounds
Categorical perception was first demonstrated with speech sounds (such as /ga/, /pa/, and /du/). Researchers were trying to make sense of the fact that humans understand speech at all when there is so much variability in how different speakers pronounce speech.
For example, when your friend says the word 'dog', it is going to be pronounced slightly differently from how your mom says the word 'dog'; yet each time, you will still hear the same 'dog' despite the variability. What's even more interesting is that you likely never even notice the variability.
Some speech sounds differ only by voice onset time (VOT). VOT is the time it takes for the vocal cords to start vibrating after the release of certain consonant sounds. For example, the sound /da/ differs from /ta/ only in the amount of time it takes for the vocal chords to vibrate after making the consonant sound. Give it a try. Make the sound /da/ and then the sound /ta/. Did you notice how it takes longer for the 'ahh' sound to come out with /ta/ than it does with /da/? In fact, the VOT for /da/ ranges from about 0 to 30 milliseconds (msec), and the VOT for /ta/ runs from about 50 to 80 msec. Take a look at Figure 1. You can see that if the VOT is between 0 to 30 msec, people report that they hear /da/, but if the VOT is 50 to 80 msec, they report hearing /ta/.
Here in Figure, notice how there is a range of msec for each speech sound. What researchers find is that it doesn't really matter what the VOT is within that range; listeners will hear the same sound regardless with a VOT of 10 msec is going to sound like the same with a VOT of 30 msec. But, increase that VOT to 50 msec