Integration of machine learning and bigdata approaches for pronunciation assessment

No Thumbnail Available
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Université Badji Mokhtar Annaba
Abstract
There is a growing interest in multilingualism, and pronunciation is the most challenging aspect of language mastery. Computer Assisted Pronunciation Training (CAPT), a specialized area of Computer Assisted Language Learning (CALL), aims to automate and enhance traditional learning methods by using digital devices in language teaching and learning through a range of pronunciation assessment software programs. These programs propose a means of detecting pronunciation errors, diagnosing them, and providing apprentices with educational and individualized feedback. A supervised deep learning technique might be used to tackle the binary classification issue for mispronunciation detection; still, this approach requires high-quality labeled audio recordings for both classes, mispronounced and well-pronounced utterances. However, the scarcity of qualitative and quantitative data in this field is one of the main obstacles. This was the primary motivation to conduct our three contributions presented in this thesis, in other words, dealing with the data sparsity problem to carry out a pronunciation error detection task on a mislabeled and imbalanced dataset. In the first solution, we considered the strength of generative models in learning representations, particularly Variational Autoencoder (VAE). VAE was used to perform an anomaly detection task by learning distributions in the latent space of the “good” pronunciations and then, detecting the “bad” ones as outliers. Our second contribution consists of using a discriminative Convolutional Neural Network (CNN) and exploring its power in extracting features from speech data to perform a one-class classification approach. Finally, Data Augmentation (DA) techniques were proposed as a third solution to augment the waveforms of our training data. DA allows us to perform ispronunciation detection in a supervised manner with the Support Vector Machine (SVM) model.
Description
Keywords
pronunciation assessment; deep learning; machine learning; bigdata; data visualization
Citation