Research Helps Put Words in Our Mouths
As COVID-19 overtook the world, people had to navigate the intricacies and frustrations of communicating with masks as speakers and listeners. While most of us rely on speech reading during conversations to some degree, the clarity issue particularly affects those who have a hearing loss.
In the midst of the pandemic, electrical and computer engineering professor Mohammad Imtiaz learned about students with hearing issues who struggled because they couldn’t read lips hidden by masks.
Could artificial intelligence help solve the problem, he wondered?
“The idea is that you can give data to the system, which then tries to learn from this data and formulate a program to solve what you’re looking for,” Imtiaz said. “It’s trying to mimic as closely as possible what humans are capable of doing.”
He decided machine learning, a subset of artificial intelligence, offered a possible answer. Machine learning algorithms learn from data and build models to solve problems without a human explicitly programming them.
“Research shows 40% of English words can be lip-read, so can we recover this loss of 40% due to mask? I could take (video of) a person talking normally, without a mask, and I can take another video (of them) saying the same things but with a mask on.
“These together are known as training data, which is used to train the machine-learning algorithm. The algorithm learns by itself how lips should be moving when a person speaks. Once fully trained it can predict lip movement of a speaker wearing a mask.”
He enlisted three students — seniors Luz Schwalb, Madison Derringer and Nicholas Muntean — for the two-semester capstone project. They used videos from YouTube and TED Talks, digitally superimposing masks on the speakers and analyzing the resulting material.
Work continued after the students graduated in May 2021 as they agreed to work around their post-graduation employment.
“This is a great success story — students were not only enthusiastic about their senior project but continued to maintain their connection to us in their professional life,” Imtiaz said, adding where the project stands today.
“(It’s) where we can actually deal with lip landmarks and we can see lips moving with the video feed of a person wearing a mask.”
Parts of the project — such as superimposing speakers’ heads, faces and masks are still in the works. Another goal is to create an app, possibly with a web link allowing use in real time and with a smartphone.
Imtiaz believes the project’s value isn’t limited to the pandemic era.
“Think about hospital environments where a doctor has a patient who has hearing impairment, or a hazardous environment where you still need a mask,” he said. “So this project ... it's going to stay relevant.”