Integration of machine learning and bigdata approaches for pronunciation assessment

LOUNIS, Meriem

Integration of machine learning and bigdata approaches for pronunciation assessment

Files

Lounis Meriem.pdf (4.63 MB)

Date

2025

Authors

LOUNIS, Meriem

Publisher

Université Badji Mokhtar Annaba

Abstract

There is a growing interest in multilingualism, and pronunciation is the most challenging aspect of language mastery. Computer Assisted Pronunciation Training (CAPT), a specialized area of Computer Assisted Language Learning (CALL), aims to automate and enhance traditional learning methods by using digital devices in language teaching and learning through a range of pronunciation assessment software programs. These programs propose a means of detecting pronunciation errors, diagnosing them, and providing apprentices with educational and individualized feedback. A supervised deep learning technique might be used to tackle the binary classification issue for mispronunciation detection; still, this approach requires high-quality labeled audio recordings for both classes, mispronounced and well-pronounced utterances. However, the scarcity of qualitative and quantitative data in this field is one of the main obstacles. This was the primary motivation to conduct our three contributions presented in this thesis, in other words, dealing with the data sparsity problem to carry out a pronunciation error detection task on a mislabeled and imbalanced dataset. In the first solution, we considered the strength of generative models in learning representations, particularly Variational Autoencoder (VAE). VAE was used to perform an anomaly detection task by learning distributions in the latent space of the “good” pronunciations and then, detecting the “bad” ones as outliers. Our second contribution consists of using a discriminative Convolutional Neural Network (CNN) and exploring its power in extracting features from speech data to perform a one-class classification approach. Finally, Data Augmentation (DA) techniques were proposed as a third solution to augment the waveforms of our training data. DA allows us to perform ispronunciation detection in a supervised manner with the Support Vector Machine (SVM) model.

Keywords

pronunciation assessment; deep learning; machine learning; bigdata; data visualization

URI

https://dspace.univ-annaba.dz//handle/123456789/4371

Collections

Thèses de doctorat

Full item page