Articulation index-based acoustic signal processing for enhanced speech intelligibility
Abstract
This paper presents an innovative approach to improving speech intelligibility using the wavelet transform and the Articulation Index (AI) as an objective evaluation metric. Conventional methods such as the Modified Rhyme Test (MRT) and Mean Opinion Score (MOS) rely on subjective assessment, making them time-consuming and difficult to standardize. In contrast, AI provides a consistent and reliable measure of speech intelligibility across varying noise conditions. The proposed method applies wavelet packet transform to noisy speech signals, followed by a thresholding function to enhance signal quality and intelligibility. The processed speech is then reconstructed using the inverse wavelet transform. Experiments are conducted using the Noiseus database, which contains speech signals corrupted by real-world noises such as streets, airports, cockpits, and industrial environments like mechanical factories, with noise levels ranging from 0 dB to 15 dB. Three different enhancement methods are implemented, with the proposed method demonstrating superior performance in terms of AI values. Experimental validation is supported by plots and spectrograms, highlighting its effectiveness over existing approaches. The method leverages the multi-resolution property of the wavelet transform to preserve temporal characteristics while reducing noise across multiple frequency bands. Results show a significant improvement in AI values, indicating enhanced speech intelligibility under diverse noise conditions. This work contributes to acoustic speech enhancement by providing a robust, objective framework suitable for applications in noisy environments such as industrial communication systems, and this technique aligns closely with noise mitigation approaches used in structural and industrial surroundings. Additionally, the approach can be extended to industrial speech enhancement and environmental noise control.
Copyright (c) 2026 Mahesh Shankarrao Patil, Vijaykumar Varadarajan, Harsha Jitendra Sarode, Farook Sayyad, Rahul Krishna Sarawale, Shabnam Sayyad, Deshinta Arrova Dewi

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
[1]Cooke M, Barker J, Cunningham S, et al. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America. 2006; 120(5): 2421–2424. doi: 10.1121/1.2229005
[2]Gannot S, Vincent E, Markovich-Golan S, et al. A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017; 25(4): 692–730. doi: 10.1109/TASLP.2016.2647702
[3]Zheng C, Zhang H, Liu W, et al. Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods. Trends in Hearing. 2023; 27: 23312165231209913. doi: 10.1177/23312165231209913
[4]Hu Y, Loizou PC. Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing. 2008; 16(1): 229–238. doi: 10.1109/TASL.2007.911054
[5]Loizou PC, Kim G. Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions. IEEE Transactions on Audio, Speech, and Language Processing. 2011; 19(1): 47–56. doi: 10.1109/TASL.2010.2045180
[6]Chen F, Loizou PC. Predicting the Intelligibility of Vocoded Speech. Ear & Hearing. 2011; 32(3): 331–338. doi: 10.1097/AUD.0b013e3181ff3515
[7]Doclo S, Kellermann W, Makino S, et al. Multichannel Signal Enhancement Algorithms for Assisted Listening Devices: Exploiting spatial diversity using multiple microphones. IEEE Signal Processing Magazine. 2015; 32(2): 18–30. doi: 10.1109/MSP.2014.2366780
[8]Stéphane M. Wavelet Bases. In: A Wavelet Tour of Signal Processing. Elsevier; 2009. pp. 263–376. doi: 10.1016/B978-0-12-374370-1.00011-2
[9]Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory. 1990; 36(5): 961–1005. doi: 10.1109/18.57199
[10]Rasetshwane DM, Boston JR, Li CC. Use of the Articulation Index to Design a Wavelet Packet-Based Method for Improving Speech Intelligibility. In: Proceedings of the 2007 15th International Conference on Digital Signal Processing; 1–4 July 2007; Cardiff, UK. pp. 643–646. doi: 10.1109/ICDSP.2007.4288664
[11]Donoho DL. De-noising by soft-thresholding. IEEE Transactions on Information Theory. 1995; 41(3): 613–627. doi: 10.1109/18.382009
[12]Hao X, Xu C, Zhang C, et al. A neural network approach for speech enhancement and noise-robust bandwidth extension. Computer Speech & Language. 2025; 89: 101709. doi: 10.1016/j.csl.2024.101709
[13]Martin R. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing. 2001; 9(5): 504–512. doi: 10.1109/89.928915
[14]So S, Paliwal KK. Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Communication. 2011; 53(6): 818–829. doi: 10.1016/j.specom.2011.02.001
[15]Yuliani AR, Amri MF, Suryawati E, et al. Speech Enhancement Using Deep Learning Methods: A Review. Jurnal Elektronika dan Telekomunikasi. 2021; 21(1): 19. doi: 10.14203/jet.v21.19-26
[16]Stevens SS, Rogers MS, Herrnstein RJ. The Apparent Reduction of Loudness: A Repeat Experiment. The Journal of the Acoustical Society of America. 1955; 27(2): 326–328. doi: 10.1121/1.1907523
[17]Boll S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1979; 27(2): 113–120. doi: 10.1109/TASSP.1979.1163209
[18]Drgas S. A Survey on Low-Latency DNN-Based Speech Enhancement. Sensors. 2023; 23(3): 1380. doi: 10.3390/s23031380
[19]Kates JM. The short-time articulation index. Journal of Rehabilitation Research and Development. 1987; 24(4): 271–276. Available online: https://pubmed.ncbi.nlm.nih.gov/3430385/
[20]Plomp R. Auditory handicap of hearing impairment and the limited benefit of hearing aids. The Journal of the Acoustical Society of America. 1978; 63(2): 533–549. doi: 10.1121/1.381753
[21]Rasetshwane DM, Boston JR, Li CC, et al. Enhancement of speech intelligibility using transients extracted by wavelet packets. In: Proceedings of the 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; 18–21 October 2009; New Paltz, NY, USA. pp. 173–176. doi: 10.1109/ASPAA.2009.5346465
[22]Johnson MT, Yuan X, Ren Y. Speech signal enhancement through adaptive wavelet thresholding. Speech Communication. 2007; 49(2): 123–133. doi: 10.1016/j.specom.2006.12.002
[23]Zhang Y, Nissen SL, Francis AL. Acoustic characteristics of English lexical stress produced by native Mandarin speakers. The Journal of the Acoustical Society of America. 2008; 123(6): 4498–4513. doi: 10.1121/1.2902165
[24]Casey M, Rhodes C, Slaney M. Analysis of Minimum Distances in High-Dimensional Musical Spaces. IEEE Transactions on Audio, Speech, and Language Processing. 2008; 16(5): 1015–1028. doi: 10.1109/TASL.2008.925883
[25]Roy SK, Paliwal KK. Robustness and sensitivity metrics-based tuning of the augmented Kalman filter for single-channel speech enhancement. Applied Acoustics. 2022; 185: 108355. doi: 10.1016/j.apacoust.2021.108355
[26]Healy EW, Yoho SE, Wang Y, et al. An algorithm to improve speech recognition in noise for hearing-impaired listeners. The Journal of the Acoustical Society of America. 2013; 134(4): 3029–3038. doi: 10.1121/1.4820893
[27]Narayanan A, Wang D. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; 26–31 May 2013; Vancouver, BC, Canada. pp. 7092–7096. doi: 10.1109/ICASSP.2013.6639038
[28]Maganti H, Matassoni M. Bio-Inspired Auditory Processing for Speech Feature Enhancement. In: Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing; 26–29 January 2011; Rome, Italy. pp. 51–58. doi: 10.5220/0003145800510058
[29]Wang D, Chen J. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(10): 1702–1726. doi: 10.1109/TASLP.2018.2842159
[30]Yechuri S, Vanabathina SD. Speech Enhancement: A Review of Different Deep Learning Methods. International Journal of Image and Graphics. 2025; 25(3): 2550024. doi: 10.1142/S021946782550024X
[31]Tan K, Zhang X, Wang D. Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021; 29: 1853–1863. doi: 10.1109/TASLP.2021.3082318
[32]Elliott SJ, Nelson PA. Active noise control. IEEE Signal Processing Magazine. 1993; 10(4): 12–35. doi: 10.1109/79.248551
[33]Lucero JC, Munhall KG. A model of facial biomechanics for speech production. The Journal of the Acoustical Society of America. 1999; 106(5): 2834–2842. doi: 10.1121/1.428108
[34]Fu Y, Wang X. Advancements and trends in vehicle sound package for noise control: A comprehensive review. Advances in Mechanical Engineering. 2025; 17(6): 16878132251345867. doi: 10.1177/16878132251345867
[35]Scalart P, Filho JV. Speech enhancement based on a priori signal to noise estimation. In: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings; 7–10 May 1996; Atlanta, GA, USA. pp. 629–632. doi: 10.1109/ICASSP.1996.543199
[36]Cohen I. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing. 2003; 11(5): 466–475. doi: 10.1109/TSA.2003.811544
[37]Guðnason J, Fang G, Brookes M. Epoch-Based Spectrum Estimation for Speech. In: Proceedings of the Interspeech 2023; 20–24 August 2023; Dublin, Ireland. pp. 4274–4278. doi: 10.21437/Interspeech.2023-407
[38]Shankar N, Bhat GS, Panahi IMS. Real-Time Single-Channel Deep Neural Network-Based Speech Enhancement on Edge Devices. In: Proceedings of the Interspeech 2020; 25 October 2020; Shanghai, China. pp. 3281–3285. doi: 10.21437/Interspeech.2020-1901
[39]Lu X, Chen H, He X. A Frequency Domain Fitting Algorithm Method for Automotive Suspension Structure under Colored Noise. World Electric Vehicle Journal. 2024; 15(9): 410. doi: 10.3390/wevj15090410
[40]Hou J, Yi H, Xiang X, et al. Identification of vehicle suspension shock absorber rattle noise based on wavelet packet feature fusion and GWO-LSTM. Sound & Vibration. 2025; 59(2): 1941. doi: 10.59400/sv1941
[41]Green T, Hilkhuysen G, Huckvale M, et al. Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement. Trends in Hearing. 2022; 26: 23312165211068629. doi: 10.1177/23312165211068629
[42]Ganapathy S, Thomas S, Hermansky H. Modulation frequency features for phoneme recognition in noisy speech. The Journal of the Acoustical Society of America. 2009; 125(1): EL8–EL12. doi: 10.1121/1.3040022
[43]Allen JB, Rabiner LR. A unified approach to short-time Fourier analysis and synthesis. Proceedings of the IEEE. 1977; 65(11): 1558–1564. doi: 10.1109/PROC.1977.10770
[44]Gupta P, Patil HA, Guido RC. Vulnerability issues in Automatic Speaker Verification (ASV) systems. EURASIP Journal on Audio, Speech, and Music Processing. 2024; 2024(1): 10. doi: 10.1186/s13636-024-00328-8




