Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition
Journal article
El Bialy, R., Chen, D., Fenghour, S., Hussein, W., Xiao, P., Karam, O. H. and Li, B. (2022). Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition. CAAI Transactions on Intelligence Technology. 8 (1), pp. 128-139. https://doi.org/10.1049/cit2.12131
Authors | El Bialy, R., Chen, D., Fenghour, S., Hussein, W., Xiao, P., Karam, O. H. and Li, B. |
---|---|
Abstract | Lip-reading is a process of interpreting speech by visually analyzing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance systems performance. Different classification schemas have been investigated, including character-based and visemes-based schemas. In this presented work, the visual front-end model of the system consists of a Spatial-Temporal (3D) convolution followed by a 2D ResNet. Transformers utilize multi-headed attention for the phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art approaches in lip reading sentences the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios. |
Keywords | Lip-reading; Phoneme-based Lip-reading; Deep Learning; Deep Neural Networks; Transformers; Spatial-Temporal convolution |
Year | 2022 |
Journal | CAAI Transactions on Intelligence Technology |
Journal citation | 8 (1), pp. 128-139 |
Publisher | Wiley |
ISSN | 2468-2322 |
Digital Object Identifier (DOI) | https://doi.org/10.1049/cit2.12131 |
Web address (URL) | https://ietresearch.onlinelibrary.wiley.com/journal/24682322 |
Publication dates | |
17 Aug 2022 | |
Publication process dates | |
Accepted | 20 Jul 2022 |
Deposited | 28 Jul 2022 |
Publisher's version | License File Access Level Open |
Accepted author manuscript | License File Access Level Controlled |
https://openresearch.lsbu.ac.uk/item/91667
Download files
Publisher's version
CAAI Trans on Intel Tech - 2022 - El‐Bialy - Developing phoneme‐based lip‐reading sentences system for silent speech (2).pdf | ||
License: CC BY 4.0 | ||
File access level: Open |
155
total views104
total downloads4
views this month8
downloads this month