The Development of Machine Learning-Based Arabic Pronunciation Learning Classifier using Teachable Machine
Abstract
Voice recognition technology has emerged as a valuable tool in language learning, offering real-time pronunciation feedback often missing in traditional classrooms. With the global voice recognition market valued at over USD 12.6 billion in 2022 and expected to exceed USD 50 billion by 2030, its integration into education is becoming increasingly significant, particularly in language apps where over 70% now use speech input to support speaking skills.In Brunei, students learning Arabic face challenges in mastering pronunciation due to limited speaking practice and a lack of individualized corrective feedback. This study aims to address the local need by developing a machine learning-based voice recognition model for evaluating Arabic pronunciation accuracy.The machine learning model was developed using Google’s Teachable Machine, trained on a custom dataset of four Arabic words spoken by both adult and child voices. Each word class included approximately 30 audio samples in total. After training, the model was exported and embedded into a Unity-based game environment designed to guide learners through pronunciation tasks, with real-time visual feedback triggered by the model’s classification. This study showed the system was able to recognize pronunciation patterns with moderate accuracy and deliver feedback through an intuitive user interface. These results demonstrate the potential of using accessible AI tools and small, diverse datasets to develop interactive pronunciation aids, especially in regions where Arabic is taught as a second language.
References
Al-Fraihat, D., Sharrab, Y., Alzyoud, F., Qahmash, A., Tarawneh, M., & Maaita, A. (2024). Speech recognition utilizing deep learning: A systematic review of the latest developments. Human-centric computing and information sciences, 14.
Alqadasi, A. M. A., Abdulghafor, R., Sunar, M. S., & Salam, M. S. B. H. (2023). Modern standard Arabic speech corpora: A systematic review. Ieee Access, 11, 55771-55796.
Alsayadi, H. A., Abdelhamid, A. A., Hegazy, I., Alotaibi, B., & Fayed, Z. T. (2022). Deep investigation of the recent advances in dialectal arabic speech recognition. IEEE access, 10, 57063-57079.
Annuš, N. (2024). EDUCATIONAL SOFTWARE AND ARTIFICIAL INTELLIGENCE: STUDENTS'EXPERIENCES AND INNOVATIVE SOLUTIONS. Information Technologies and Learning Tools, 101(3), 200.
Besdouri, F. Z., Zribi, I., & Belguith, L. H. (2024). Challenges and progress in developing speech recognition systems for Dialectal Arabic. Speech Communication, 103110.
bin Haji Maila, A. H. A., & bin Ampuan, A. D. H. B. Arabic Schools in Negara Brunei Darussalam (1941–2005): Development and Challenges.
bin Haji Omar, A. H., bin Yaakob, M. B., Yani, A., bin Haji Sismat, M. A., & binti Salim, N. A. U. THE TECHNOLOGY GAP IN ARABIC LANGUAGE LEARNING ACROSS SCHOOLS IN BRUNEI DARUSSALAM.
Bogach, N., Boitsova, E., Chernonog, S., Lamtev, A., Lesnichaya, M., Lezhenin, I., ... & Blake, J. (2021). Speech processing for language learning: A practical approach to computer-assisted pronunciation teaching. Electronics, 10(3), 235.
Chen, Y. (2023). The effect of using a game-based translation learning app on enhancing college EFL learners’ motivation and learning experience. Education and Information Technologies, 28(1), 255-282.
Diller, F., Scheuermann, G., & Wiebel, A. (2022). Visual cue based corrective feedback for motor skill training in mixed reality: A survey. IEEE Transactions on Visualization and Computer Graphics.
Eshankulovna, R. A. (2021). Modern technologies and mobile apps in developing speaking skill. Linguistics and Culture Review, 1216-1225.
Gallegos, R. (2023). Training for Children’s Online EFL Teachers: An Exploration of Strategies Developed by Those in the Field. University of Arizona Global Campus.
Goel, A., Goel, A. K., & Kumar, A. (2023). The role of artificial neural network and machine learning in utilizing spatial information. Spatial Information Research, 31(3), 275-285.
Graham, C., & Roll, N. (2024). Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits. JASA Express Letters, 4(2).
Huyen, C. (2022). Designing machine learning systems. " O'Reilly Media, Inc.".
HolonIQ. (2023). Language Learning Apps and the Global EdTech Market. Retrieved from https://www.holoniq.com
Jahan, M., Mazumdar, P., Thebaud, T., Hasegawa-Johnson, M., Villalba, J., Dehak, N., & Moro-Velazquez, L. (2025, April). Unveiling Performance Bias in ASR Systems: A Study on Gender, Age, Accent, and More. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.
Jing, Y. (2025). Error pattern recognition and correction methods in English oral learning process based on deep learning. Journal of Computational Methods in Sciences and Engineering, 14727978251318798.
Kalyanov, A. (2024). Software Developers’ Experiences With CALL In the Context of the Four Language Competencies (Reading, Writing, Listening, and Speaking) and Teacher and Learner Fit: A Qualitative Descriptive Study (Doctoral dissertation, University of Massachusetts Global).
Kheddar, H., Himeur, Y., Al-Maadeed, S., Amira, A., & Bensaali, F. (2023). Deep transfer learning for automatic speech recognition: Towards better generalization. Knowledge-Based Systems, 277, 110851.
Kholis, A. (2021). Elsa speak app: automatic speech recognition (ASR) for supplementing English pronunciation skills. Pedagogy: Journal of English Language Teaching, 9(1), 01-14.
Lázaro, R. R. (2024). Exploring An 11th-grade Teacher’s Understanding, Practices, and Perceptions of Corrective Feedback to Enhance Second Language Pronunciation: A Case Study (Doctoral dissertation, Universidad De Córdoba).
Lenson, A. K., & Airlangga, G. (2023). Comparative analysis of MLP, CNN, and RNN models in automatic speech recognition: dissecting performance metric. Buletin Ilmiah Sarjana Teknik Elektro, 5(4), 576-583.
MarketsandMarkets. (2023). Speech and Voice Recognition Market – Global Forecast to 2030. Retrieved from https://www.marketsandmarkets.com
Moemeke, C. D. (2024). Artificial Intelligence and Machine Learning in enhancing Science Learning Experiences: Exploring Possibilities and Concerns. NIU Journal of Educational Research, 10(2), 59-72.
O’Brien, M. G. (2021). Ease and difficulty in L2 pronunciation teaching: A mini-review. Frontiers in Communication, 5, 626985.
Radif, M., & Hameed, O. M. (2024). AI-driven innovations in e-learning: Transforming educational paradigms for enhanced learning outcomes. Arts Educa, 38.
Rahman, A., Kabir, M. M., Mridha, M. F., Alatiyyah, M., Alhasson, H. F., & Alharbi, S. S. (2024). Arabic speech recognition: Advancement and challenges. IEEE Access.
Rahman, A., Kabir, M. M., Mridha, M. F., Alatiyyah, M., Alhasson, H. F., & Alharbi, S. S. (2024). Arabic speech recognition: Advancement and challenges. IEEE Access.
Rye, S., Sousa, M., & Sousa, C. (2025). Introduction to Game-Based Learning. In Transformative Learning Through Play: Analogue Games as Vehicles for Educational Innovation (pp. 29-68). Cham: Springer Nature Switzerland.
Saadia, K. H. (2023). Assessing the Effectiveness of Text-to-Speech and Automatic Speech Recognition in Improving EFL Learner’s Pronunciation of Regular Past-ed.
Sahu, B., Palo, H. K., & Mohanty, S. N. (2021, March). A performance evaluation of machine learning algorithms for emotion recognition through speech. In 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 13-17). IEEE.
Santos, C. F. G. D., & Papa, J. P. (2022). Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys (Csur), 54(10s), 1-25.
Shadiev, R., & Feng, Y. (2024). Using automated corrective feedback tools in language learning: a review study. Interactive learning environments, 32(6), 2538-2566.
Sholihah, A. F., & Kholis, A. (2025). Developing Electronic English Educational Cartoon (E-EduToon): A Culturally-Based Supplementary Material for Reading Instruction. Journal of General Education and Humanities, 4(2), 401-430.
Skulmowski, A., & Xu, K. M. (2022). Understanding cognitive load in digital and online learning: A new perspective on extraneous cognitive load. Educational psychology review, 34(1), 171-196.
Stecker, A., & D’Onofrio, A. (2022). Variation in Evaluations of Gendered Voices: Individual Speakers Condition the Variant Frequency Effect. Journal of English Linguistics, 50(3), 281-314.
Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation. Frontiers in Psychology, 14, 1210187.
Taeza, J. (2025). The role of AI-powered chatbots in enhancing second language acquisition: An empirical investigation of conversational AI assistants. Edelweiss Applied Science and Technology, 9(3), 2616-2629.
Yani, A., Muritala, Y. T., Ismail, N. A. B. H., Abd, M. A. Z. B. H., & Bint Jawrami, N. A. (2020). The Content of the curriculum of Arabic language of the Arabic secondary schools of Brunei Darussalam: Class Eleven as a Case Study. European Journal of Education Studies.
Zellou, G., Cohn, M., & Ferenc Segedin, B. (2021). Age-and gender-related differences in speech alignment toward humans and voice-AI. Frontiers in Communication, 5, 600361.
Zhang, M. (2024). Enhancing self-regulation and learner engagement in L2 speaking: exploring the potential of intelligent personal assistants within a learning-oriented feedback framework. BMC psychology, 12(1), 421.
Zhang, S., Liu, R., Tao, X., & Zhao, X. (2021). Deep cross-corpus speech emotion recognition: Recent advances and perspectives. Frontiers in neurorobotics, 15, 784514.
Zhang, Y., Kutscher, D., & Cui, Y. (2024). Networked metaverse systems: Foundations, gaps, research directions. IEEE Open Journal of the Communications Society.
Zhou, J., Huang, S., Wang, M., & Qiu, Y. (2022). Performance evaluation of hybrid GA–SVM and GWO–SVM models to predict earthquake-induced liquefaction potential of soil: a multi-dataset investigation. Engineering with Computers, 1-19.
Zouhair, T. (2021). Automatic Speech Recognition for low-resource languages using Wav2Vec2: Modern Standard Arabic (MSA) as an example of a low-resource language.
Author(s) and co-author(s) jointly and severally represent and warrant that the Article is original with the author(s) and does not infringe any copyright or violate any other right of any third parties and that the Article has not been published elsewhere. Author(s) agree to the terms that the IJO Journal will have the full right to remove the published article on any misconduct found in the published article.