Eye Can’t Hear You

If you had to choose between trusting your eyes, your ears, or your common sense, which would you choose? I made this choice not long ago when I was scrolling through instagram and came across a video that made me doubt my senses. I rewatched it 3 times thinking, “How is this even real?” It wasn’t until I realized the video was from an account called @badlipreading that my doubts began to make sense. The account, @badlipreading, is known for dubbing ridiculous dialogue over videos, ultimately creating a perceptually fluid experience for their audience, while also changing the context and meaning behind what was actually said. 

I was quite stunned at how well the dubbing had tricked me. The only indication that something was off was the complete nonsensical nature of what was being said in the video. It begs the question of how our senses process speech, and more specifically, how our senses interact to create a harmonious perceptual experience and how we can be fooled.

Bad Lip Reading Example

The Brain during Visual Speech

What had occurred in the @badlipreading post was an instance of faux auditory input influencing the audience’s visual perception of how the lips were moving. This phenomena is seen in scientific literature too, where studies have investigated how pairing a sound with an ambiguous, mismatched lipread stimulus prompts participants to visually interpret and label the lipread in accordance with the sound that was played1. It seems that if one sense is vague enough, like the ambiguous lipread that was shown, we rely on other senses, like our ability to hear, to fill in the gaps.

In some ways, the bad lip reading video was an example of the McGurk effect in reverse. You may not know the McGurk effect by name, but you may be familiar with the concept. It describes an audiovisual speech illusion in which an individual will perceive hearing a different syllable depending on the visual stimulus that is paired with it. Most famously, the sound “ba” is paired with a visual stimulus of someone saying “ga”. When paired with this visual stimulus, the listener mistakes the sound as “da”, instead of correctly hearing the syllable “ba”2.

The McGurk Effect

One neuroimaging study sought to find out what regions of the brain are used during the McGurk effect. To parse through the web of sensory input during this audio-visual illusion, researchers introduced a time delay during the McGurk effect, in which the auditory stimulus was introduced either before or after the visual stimulus was shown. Participants were asked to record what they heard when the sound “aba” was paired with a visual stimulus of someone saying “ava.” Unsurprisingly, when the sound was played 400ms before the mismatched visual stimulus, participants had a tendency to identify the sound correctly as “aba.”

However, activation in the brain didn’t stop at regions involved in auditory processing. You might imagine that when the sound was played immediately prior to the mismatched visual stimulus, participants’ brains would ignore or not weigh the incorrect visual stimulus as heavily. Surprisingly though, when the auditory stimulus was introduced 400ms before, the occipital lobe (responsible for processing visual input) showed higher activation than when the auditory stimulus was played in synchrony or after the visual stimulus was shown3. This suggests that the part of our brain responsible for hearing is not an island on its own, but is instead integrated with other sensory systems in the brain, communicating and influencing the patterns of activation in our visual system. It is not exactly known why researchers found more activation in the occipital lobe when the auditory stimulus was played first. It may be due to the fact that the brain detected an inconsistency in the sensory experience, requiring the visual system to work extra hard in order to perceive the audio and visual experience as a whole. Alternatively, the auditory stimulus “aba” could be changing how the brain perceives the visual stimulus of someone saying “ava,” similar to how our brains get confused when we watch bad lip reading videos.

The @Badlipreading video that fooled me.

These studies demonstrate how our brains will attempt to find congruence in our sensory experience. Specifically, when our senses receive information can play an important role in how that perceptual picture is formed. Perhaps tricksters can use this technique to make even better lip dubbing videos in the future, creating posts where the auditory dialogue precedes the visual onset of what is being said and creating an even more compelling illusion as a result.

Although these factors partially explain the trickery from the video, many of us are not lip reading gurus, so what else plays into the lip dubbing illusion? One explanation may be the role of gesturing in communication. Though perhaps unknowingly, the fake dialogue that the content creators use often has pitch and inflection that pairs seamlessly with the body language of the people they are dubbing. As our brain’s integrate this slew of auditory and visual input, it’s easy to be fooled because the sounds match what we would expect in real life.

Knowing that our senses depend on one another can explain some of the experiences we have. Perhaps it gives insight to why some people can’t tolerate watching poorly dubbed films in a different language 4. The lack of unity between what the audience sees and hears creates an unpleasant sensory experience for the audience as the brain fights to decipher which information is more important– what they see or what they hear? 

 The COVID-19 pandemic revealed another everyday instance of us using more than one sense to hear. For example, many people have likely encountered a waiter in a loud restaurant lowering their mask to communicate with a customer. Not only does lowering the mask reduce the physical barrier to make sound travel further, it also allows for your vision to aid in what you’re hearing.

Ultimately, why do we use more than one sense faculty to hear? One prevailing theory is that it’s more reliable. Using more than one sensory modality to understand the world around you is beneficial because it allows for one sense to compensate if the other is experiencing some sort of interference1. Based on my own experiences, I would say this theory has some merit, because I certainly let what I was hearing compensate for my poor lip reading skills while watching bad lip reading’s post.

1Baart, Martijn, and Jean Vroomen. “Do You See What You Are Hearing? Cross-Modal Effects of Speech Sounds on Lipreading.” Neuroscience Letters, vol. 471, no. 2, 2010, pp. 100–103., https://doi.org/10.1016/j.neulet.2010.01.019.

2 McGurk, Harry and John MacDonald. “Hearing Lips and Seeing Voices.” Nature, vol. 264, no. 5588, 1976, pp. 746–748., https://doi.org/10.1038/264746a0.

3 Jones, Jeffery A., and Daniel E. Callan. “Brain Activity during Audiovisual Speech Perception: An Fmri Study of the McGurk Effect.” NeuroReport, vol. 14, no. 8, 2003, pp. 1129–1133., https://doi.org/10.1097/00001756-200306110-00006.

4 Riniolo, Todd C., and Lesley J. Capuana. “Directly Comparing Subtitling and Dubbing Using Netflix: Examining Enjoyment Issues in the Natural Setting.” Current Psychology, vol. 41, no. 7, 2020, pp. 4252–4258., https://doi.org/10.1007/s12144-020-00948-1.