Design and Implementation of a Ubiquitous Framework for Pronunciation Learning

No Thumbnail Available
Date
2022-01-19
Journal Title
Journal ISSN
Volume Title
Publisher
Université Badji Mokhtar Annaba
Abstract
Handheld and wearable devices have exponentially increased the usage of speech-enabled interfaces and promoted the widespread use of ubiquitous learning applications that aim to be accessible from anywhere and at any time. In particular, computer-assisted language learning (CALL) applications witnessed high growth. Herein, pronunciation learning is a challenging task in ubiquitous environments. Indeed, speech signals are prone to be corrupted by several sources, such as background noises, coding errors, or channel disturbance. The original speech should be recovered from the corrupted version to assess it reliably. For that purpose, many real-world speech samples are available. On the other hand, the pronunciation ssessment task is the core component of any computer-assisted ronunciation learning (CAPL) system since it provides reliable feedback for students to improve their training. Such applications require the availability of annotated and rated nonnative speech data. However, most of the time, such corpora are not available, especially for low resource languages such as Arabic. This thesis aims to develop an Arabic pronunciation learning system in a ubiquitous environment under the scarcity of dedicated corpora. Thus, the contribution of this thesis is twofold. In the absence of dedicated corpus, an unsupervised approach is adopted to perform the speech enhancement; it consists of two steps. First, an overcomplete deep autoencoder (OAE) is trained with noisy/noisy pairs to produce enhanced speech. Next, a denoising deep autoencoder is trained in a supervised way leveraging the previous stage. The obtained results showed an improvement of the word error rate (WER) of about 4.48% for a mobile Arabic corpus. Moreover, a significant improvement was achieved for speech quality and intelligibility by 0.835 and 0.06, respectively. The second contribution aims to overcome the scarcity of nonnative computer-assisted pronunciation training (CAPT) dedicated Arabic speech corpora. Inspired by the success of deep learning, we propose to detect abnormal pronunciation in an unsupervised manner using two deep learning algorithms trained on solely correct pronunciations. Experimental results on two Arabic corpora proved the potential of the proposed approach to distinguish between good and bad pronunciations. Additional experiments leveraging audio augmentation techniques to expand the training dataset confirmed the efficiency of the proposed method.
Description
Keywords
CAPT; Pronunciation assessment; Arabic language; speech recognition; speech enhancement; unsupervised learning; anomaly detection approach; deep learning
Citation