LipNet

LipNet is a deep neural network for visual speech recognition. It was created by Yannis Assael, , and Nando de Freitas, researchers from the University of Oxford. The technique, outlined in a paper in November 2016,^[1] is able to decode text from the movement of a speaker's mouth. Traditional visual speech recognition approaches separated the problem into two stages: designing or learning visual features, and prediction. LipNet was the first end-to-end sentence-level lipreading model that learned spatiotemporal visual features and a sequence model simultaneously.^[2] Audio-visual speech recognition has enormous practical potential, with applications in improved hearing aids, medical applications, such as improving the recovery and wellbeing of critically ill patients,^[3] and speech recognition in noisy environments,^[4] such as Nvidia's autonomous vehicles.^[5]

References[]

^ Assael, Yannis M.; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (2016-12-16). "LipNet: End-to-End Sentence-level Lipreading". arXiv:1611.01599 [cs.LG].
^ "AI that lip-reads 'better than humans'". November 8, 2016 – via www.bbc.com.
^ "Home Elementor". Liopa.
^ Vincent, James (November 7, 2016). "Can deep learning help solve lip reading?". The Verge.
^ Quach, Katyanna. "Revealed: How Nvidia's 'backseat driver' AI learned to read lips". www.theregister.com.

[1] Assael, Yannis M.; Shillingford, Brendan; Whiteson, Shimon; de Freitas, Nando (2016-12-16). "LipNet: End-to-End Sentence-level Lipreading". arXiv:1611.01599 [cs.LG].

[2] "AI that lip-reads 'better than humans'". November 8, 2016 – via www.bbc.com.

[3] "Home Elementor". Liopa.

[4] Vincent, James (November 7, 2016). "Can deep learning help solve lip reading?". The Verge.

[5] Quach, Katyanna. "Revealed: How Nvidia's 'backseat driver' AI learned to read lips". www.theregister.com.

[1]

[2]

[3]

[4]

[5]