Decoder-Encoder LSTM for Lip Reading

Journal article


Fenghour, Souheil, Chen, Daqing and Xiao, Perry (2019). Decoder-Encoder LSTM for Lip Reading. Proceedings of the 2019 8th International Conference on Software and Information Engineering. https://doi.org/10.1145/3328833.3328845
AuthorsFenghour, Souheil, Chen, Daqing and Xiao, Perry
Abstract

The success of automated lip reading has been constrained by the inability to distinguish between homopheme words, which are words have different characters and produce the same lip movements (e.g. ”time” and ”some”), despite being intrinsically different. One word can often have different phonemes (units of sound) producing exactly the viseme or visual equivalent of phoneme for a unit of sound. Through the use of a Long-Short Term Memory Network with word embeddings, we can distinguish between homopheme words or words that produce identical lip movements. The neural network architecture achieved a character accuracy rate of 77.1% and a word accuracy rate of 72.2%.

Year2019
JournalProceedings of the 2019 8th International Conference on Software and Information Engineering
PublisherACM
Digital Object Identifier (DOI)https://doi.org/10.1145/3328833.3328845
Publication dates
Online09 Apr 2019
Print09 Apr 2019
Publication process dates
Deposited25 Mar 2019
Accepted23 Mar 2019
Accepted author manuscript
File Access Level
Open
Licensehttp://www.acm.org/publications/policies/copyright_policy#Background
Permalink -

https://openresearch.lsbu.ac.uk/item/866z6

Download files


Accepted author manuscript
2019 03 25 IE030.pdf
File access level: Open

  • 76
    total views
  • 207
    total downloads
  • 0
    views this month
  • 1
    downloads this month

Export as