Viseme-based Lip-Reading using Deep Learning

PhD Thesis


Fenghour, S. (2022). Viseme-based Lip-Reading using Deep Learning. PhD Thesis London South Bank University School of Engineering https://doi.org/10.18744/lsbu.9280w
AuthorsFenghour, S.
TypePhD Thesis
Abstract

Research in Automated Lip Reading is an incredibly rich discipline with so many facets that have been the subject of investigation including audio-visual data, feature extraction, classification networks and classification schemas. The most advanced and up-to-date lip-reading systems can predict entire sentences with thousands of different words and the majority of them use ASCII characters as the classification schema. The classification performance of such systems however has been insufficient and the need to cover an ever expanding range of vocabulary using as few classes as possible is challenge.
The work in this thesis contributes to the area concerning classification schemas by proposing an automated lip reading model that predicts sentences using visemes as a classification schema.
This is an alternative schema to using ASCII characters, which is the conventional class system used to predict sentences. This thesis provides a review of the current trends in deep learning-
based automated lip reading and analyses a gap in the research endeavours of automated lip-reading by contributing towards work done in the region of classification schema. A whole new line of research is opened up whereby an alternative way to do lip-reading is explored and in doing so, lip-reading performance results for predicting s entences from a benchmark dataset
are attained which improve upon the current state-of-the-art.
In this thesis, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The lip-reading system predicts sentences as a two-stage procedure with visemes being recognised as the first stage and words being classified as the second stage. This is such that the second-stage has to both overcome the one-to-many mapping problem posed in lip-reading where one set of visemes can map to several words, and the problem of visemes being confused or misclassified to begin with.
To develop the proposed lip-reading system, a number of tasks have been performed in this thesis. These include the classification of continuous sequences of visemes; and the proposal of viseme-to-word conversion models that are both effective in their conversion performance of predicting words, and robust to the possibility of viseme confusion or misclassification. The initial system reported has been testified on the challenging BBC Lip Reading Sentences 2
(LRS2) benchmark dataset attaining a word accuracy rate of 64.6%. Compared with the state-of-the-art works in lip reading sentences reported at the time, the system had achieved a significantly improved performance.
The lip reading system is further improved upon by using a language model that has been demonstrated to be effective at discriminating between homopheme words and being robust to incorrectly classified visemes. An improved performance in predicting spoken sentences from the LRS2 dataset is yielded with an attained word accuracy rate of 79.6% which is still better than another lip-reading system trained and evaluated on the the same dataset that attained a word accuracy rate 77.4% and it is to the best of our knowledge the next best observed result attained on LRS2.

Year2022
PublisherLondon South Bank University
Digital Object Identifier (DOI)https://doi.org/10.18744/lsbu.9280w
File
License
Publication dates
Print06 Jul 2022
Publication process dates
Deposited10 Nov 2022
Permalink -

https://openresearch.lsbu.ac.uk/item/9280w

Download files


File
SF_Revised_Thesis.pdf
License: CC BY 4.0

  • 149
    total views
  • 276
    total downloads
  • 3
    views this month
  • 4
    downloads this month

Export as

Related outputs

Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition
El Bialy, R., Chen, D., Fenghour, S., Hussein, W., Xiao, P., Karam, O. H. and Li, B. (2022). Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition. CAAI Transactions on Intelligence Technology. 8 (1), pp. 128-139. https://doi.org/10.1049/cit2.12131
Deep Learning-based Automated Lip-Reading: A Survey
Fenghour, S., Chen, D., Guo, K., Li, B. and Xiao, P. (2021). Deep Learning-based Automated Lip-Reading: A Survey. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3107946
Recurrent Neural Networks for Decoding Lip Read Speech
Fenghour, S, Chen, D and Xiao, P (2019). Recurrent Neural Networks for Decoding Lip Read Speech. 2019 8th International Conference on Software and Information Engineering (ICSIE 2019). Cairo 09 - 12 Apr 2019
Decoder-Encoder LSTM for Lip Reading
Fenghour, S., Chen, D. and Xiao, P. (2019). Decoder-Encoder LSTM for Lip Reading. Proceedings of the 2019 8th International Conference on Software and Information Engineering. https://doi.org/10.1145/3328833.3328845
Contour Mapping for Speaker-Independent Lip Reading System
Fenghour, S, Chen, D and Xiao, P (2018). Contour Mapping for Speaker-Independent Lip Reading System. The 11th International Conference on Machine Vision (ICMV 2018). Munich, Germany 01 - 03 Nov 2018