Lip Reading Sentences Using Deep Learning with Only Visual Cues
Fenghour, S., Chen, D., Guo, K. and Xiao, P. (2020). Lip Reading Sentences Using Deep Learning with Only Visual Cues. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3040906
|Fenghour, S., Chen, D., Guo, K. and Xiao, P.
In this paper, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The system has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Experiments with videos of varying illumination have shown that the proposed model has a good robustness to varying levels of lighting. Compared with the state-of-the-art works in lip reading sentences, the system has achieved a significantly improved performance with 15% lower word error rate. The main contributions of this paper are: 1) The classification of visemes in continuous speech using a specially designed transformer with a unique topology; 2) The use of visemes as a classification schema for lip reading sentences; and 3) The conversion of visemes to words using perplexity analysis. All the contributions serve to enhance the accuracy of lip reading sentences. The paper also provides an essential survey of the research area.
|Deep learning; Lip reading; Neural networks; Perplexity analysis; Speech recognition
|Institute of Electrical and Electronics Engineers (IEEE)
|Digital Object Identifier (DOI)
|26 Nov 2020
|Publication process dates
|19 Nov 2020
|21 Nov 2020
File Access Level
|Accepted author manuscript
File Access Level
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
8views this month
4downloads this month