Wow! Google DeepMind AI beats human experts in lip-reading tests
Google’s DeepMind artificial intelligence program may be
best known for building AplhaGO, which beatone of the
world’s best Go players, but the technology has numerous applications in the
field of science and could prove especially helpful to the hearing impaired.
Researchers from
Oxford University and DeepMind teamed up to create an AI system trained using
5000 hours of BBC videos, which contained 118,000 sentences. It managed to
outperform a professional lip-reader who provides services for UK courts.
When shown a
random sample of 200 videos from BBC broadcasts, the human lip-reader was able
to decipher less than a quarter of the spoken words. But when the AI system was
tested using the same data set, it deciphered almost half the words and could
make out entire complex phrases.
Additionally,
the machine was able to annotate 46 percent of the words without error, whereas
the professional only managed around 12 percent. Most of the AI’s mistakes were
minor, like missing the ‘s’ from the end of words.
Two weeks ago, another deep learning
system that can read lips was developed at the University of Oxford. LipNet was
also able to beat a human when it came to accurately reading lips, though the
data set used in this instance, called GRID, contained only 51 unique words,
whereas the BBC data contains nearly 17,500, according to New Scientist.
GRID also used
well-lit videos of people facing the camera and reading three seconds worth of
words. After showing the AI 29,000 videos, it had an error rate of just 6.6
percent, while humans that were tested using 300 similar videos had an average
error rate of 47.7 percent.
Researchers say
the system could find use in mobile technologies, virtual assistants, and for
general speech recognition tasks. It could also be invaluable in helping deaf
and hearing-impaired people understand others.
"A machine
that can lip read opens up a host of applications: 'dictating' instructions or
messages to a phone in a noisy environment; transcribing and redubbing archival
silent films; resolving multi-talker simultaneous speech; and, improving the
performance of automated speech recognition in general," wrote the
researchers in their paper.
No comments: