Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition
Journal article
El Bialy, R., Chen, D., Fenghour, S., Hussein, W., Xiao, P., Karam, O. H. and Li, B. (2022). Developing Phoneme-based Lip-reading Sentences System for Silent Speech Recognition. CAAI Transactions on Intelligence Technology. 8 (1), pp. 128-139. https://doi.org/10.1049/cit2.12131
| Authors | El Bialy, R., Chen, D., Fenghour, S., Hussein, W., Xiao, P., Karam, O. H. and Li, B. |
|---|---|
| Abstract | Lip-reading is a process of interpreting speech by visually analyzing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance systems performance. Different classification schemas have been investigated, including character-based and visemes-based schemas. In this presented work, the visual front-end model of the system consists of a Spatial-Temporal (3D) convolution followed by a 2D ResNet. Transformers utilize multi-headed attention for the phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art approaches in lip reading sentences the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios. |
| Keywords | Lip-reading; Phoneme-based Lip-reading; Deep Learning; Deep Neural Networks; Transformers; Spatial-Temporal convolution |
| Year | 2022 |
| Journal | CAAI Transactions on Intelligence Technology |
| Journal citation | 8 (1), pp. 128-139 |
| Publisher | Wiley |
| ISSN | 2468-2322 |
| Digital Object Identifier (DOI) | https://doi.org/10.1049/cit2.12131 |
| Web address (URL) | https://ietresearch.onlinelibrary.wiley.com/journal/24682322 |
| Publication dates | |
| 17 Aug 2022 | |
| Publication process dates | |
| Accepted | 20 Jul 2022 |
| Deposited | 28 Jul 2022 |
| Publisher's version | License File Access Level Open |
| Accepted author manuscript | License File Access Level Controlled |
https://openresearch.lsbu.ac.uk/item/91667
Download files
Publisher's version
| CAAI Trans on Intel Tech - 2022 - El‐Bialy - Developing phoneme‐based lip‐reading sentences system for silent speech (2).pdf | ||
| License: CC BY 4.0 | ||
| File access level: Open | ||
248
total views168
total downloads3
views this month0
downloads this month