Articulation index-based acoustic signal processing for enhanced speech intelligibility

Mahesh Shankarrao Patil
School of Bioengineering Sciences & Research, MIT ADT University, Pune 412201, India
Vijaykumar Varadarajan
Department of Research, Swiss School of Business and Management, 1213 Geneva, Switzerland
Harsha Jitendra Sarode
Department of Electronics & Telecommunication Engineering, Nutan Maharashtra Institute of Engineering & Technology, Pune 410507, India
Farook Sayyad
Department of Mechanical Engineering, Ajeenkya DY Patil School of Engineering, Pune 411081, India
Rahul Krishna Sarawale
KPI Partners India Pvt. Ltd., Pune 411057, India
Shabnam Sayyad
Department of Artificial Intelligence and Machine Learning, AISSMS College of Engineering, Pune 411001, India
Deshinta Arrova Dewi
Faculty of Data Science, INTI International University, Nilai 71800, Malaysia

Article ID: 2034

DOI: https://doi.org/10.59400/sv2034

Keywords: speech intelligibility enhancement; Noiseus database; thresholding function; articulation index (AI); wavelet transform; multi-resolution analysis; process innovation; auditory masking

Abstract

This paper presents an innovative approach to improving speech intelligibility using the wavelet transform and the Articulation Index (AI) as an objective evaluation metric. Conventional methods such as the Modified Rhyme Test (MRT) and Mean Opinion Score (MOS) rely on subjective assessment, making them time-consuming and difficult to standardize. In contrast, AI provides a consistent and reliable measure of speech intelligibility across varying noise conditions. The proposed method applies wavelet packet transform to noisy speech signals, followed by a thresholding function to enhance signal quality and intelligibility. The processed speech is then reconstructed using the inverse wavelet transform. Experiments are conducted using the Noiseus database, which contains speech signals corrupted by real-world noises such as streets, airports, cockpits, and industrial environments like mechanical factories, with noise levels ranging from 0 dB to 15 dB. Three different enhancement methods are implemented, with the proposed method demonstrating superior performance in terms of AI values. Experimental validation is supported by plots and spectrograms, highlighting its effectiveness over existing approaches. The method leverages the multi-resolution property of the wavelet transform to preserve temporal characteristics while reducing noise across multiple frequency bands. Results show a significant improvement in AI values, indicating enhanced speech intelligibility under diverse noise conditions. This work contributes to acoustic speech enhancement by providing a robust, objective framework suitable for applications in noisy environments such as industrial communication systems, and this technique aligns closely with noise mitigation approaches used in structural and industrial surroundings. Additionally, the approach can be extended to industrial speech enhancement and environmental noise control.

Published

2026-05-09

How to Cite

Patil, M. S., Varadarajan, V., Sarode, H. J., Sayyad, F., Sarawale, R. K., Sayyad, S., & Dewi, D. A. (2026). Articulation index-based acoustic signal processing for enhanced speech intelligibility. Sound & Vibration, 60(3). https://doi.org/10.59400/sv2034

Download Citation

Issue

Vol. 60 No. 3 (2026)

Section

Article

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

[1]Cooke M, Barker J, Cunningham S, et al. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America. 2006; 120(5): 2421–2424. doi: 10.1121/1.2229005

[2]Gannot S, Vincent E, Markovich-Golan S, et al. A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017; 25(4): 692–730. doi: 10.1109/TASLP.2016.2647702

[3]Zheng C, Zhang H, Liu W, et al. Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods. Trends in Hearing. 2023; 27: 23312165231209913. doi: 10.1177/23312165231209913

[4]Hu Y, Loizou PC. Evaluation of Objective Quality Measures for Speech Enhancement. IEEE Transactions on Audio, Speech, and Language Processing. 2008; 16(1): 229–238. doi: 10.1109/TASL.2007.911054

[5]Loizou PC, Kim G. Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions. IEEE Transactions on Audio, Speech, and Language Processing. 2011; 19(1): 47–56. doi: 10.1109/TASL.2010.2045180

[6]Chen F, Loizou PC. Predicting the Intelligibility of Vocoded Speech. Ear & Hearing. 2011; 32(3): 331–338. doi: 10.1097/AUD.0b013e3181ff3515

[7]Doclo S, Kellermann W, Makino S, et al. Multichannel Signal Enhancement Algorithms for Assisted Listening Devices: Exploiting spatial diversity using multiple microphones. IEEE Signal Processing Magazine. 2015; 32(2): 18–30. doi: 10.1109/MSP.2014.2366780

[8]Stéphane M. Wavelet Bases. In: A Wavelet Tour of Signal Processing. Elsevier; 2009. pp. 263–376. doi: 10.1016/B978-0-12-374370-1.00011-2

[9]Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory. 1990; 36(5): 961–1005. doi: 10.1109/18.57199

[10]Rasetshwane DM, Boston JR, Li CC. Use of the Articulation Index to Design a Wavelet Packet-Based Method for Improving Speech Intelligibility. In: Proceedings of the 2007 15th International Conference on Digital Signal Processing; 1–4 July 2007; Cardiff, UK. pp. 643–646. doi: 10.1109/ICDSP.2007.4288664

[11]Donoho DL. De-noising by soft-thresholding. IEEE Transactions on Information Theory. 1995; 41(3): 613–627. doi: 10.1109/18.382009

[12]Hao X, Xu C, Zhang C, et al. A neural network approach for speech enhancement and noise-robust bandwidth extension. Computer Speech & Language. 2025; 89: 101709. doi: 10.1016/j.csl.2024.101709

[13]Martin R. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing. 2001; 9(5): 504–512. doi: 10.1109/89.928915

[14]So S, Paliwal KK. Modulation-domain Kalman filtering for single-channel speech enhancement. Speech Communication. 2011; 53(6): 818–829. doi: 10.1016/j.specom.2011.02.001

[15]Yuliani AR, Amri MF, Suryawati E, et al. Speech Enhancement Using Deep Learning Methods: A Review. Jurnal Elektronika dan Telekomunikasi. 2021; 21(1): 19. doi: 10.14203/jet.v21.19-26

[16]Stevens SS, Rogers MS, Herrnstein RJ. The Apparent Reduction of Loudness: A Repeat Experiment. The Journal of the Acoustical Society of America. 1955; 27(2): 326–328. doi: 10.1121/1.1907523

[17]Boll S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1979; 27(2): 113–120. doi: 10.1109/TASSP.1979.1163209

[18]Drgas S. A Survey on Low-Latency DNN-Based Speech Enhancement. Sensors. 2023; 23(3): 1380. doi: 10.3390/s23031380

[19]Kates JM. The short-time articulation index. Journal of Rehabilitation Research and Development. 1987; 24(4): 271–276. Available online: https://pubmed.ncbi.nlm.nih.gov/3430385/

[20]Plomp R. Auditory handicap of hearing impairment and the limited benefit of hearing aids. The Journal of the Acoustical Society of America. 1978; 63(2): 533–549. doi: 10.1121/1.381753

[21]Rasetshwane DM, Boston JR, Li CC, et al. Enhancement of speech intelligibility using transients extracted by wavelet packets. In: Proceedings of the 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; 18–21 October 2009; New Paltz, NY, USA. pp. 173–176. doi: 10.1109/ASPAA.2009.5346465

[22]Johnson MT, Yuan X, Ren Y. Speech signal enhancement through adaptive wavelet thresholding. Speech Communication. 2007; 49(2): 123–133. doi: 10.1016/j.specom.2006.12.002

[23]Zhang Y, Nissen SL, Francis AL. Acoustic characteristics of English lexical stress produced by native Mandarin speakers. The Journal of the Acoustical Society of America. 2008; 123(6): 4498–4513. doi: 10.1121/1.2902165

[24]Casey M, Rhodes C, Slaney M. Analysis of Minimum Distances in High-Dimensional Musical Spaces. IEEE Transactions on Audio, Speech, and Language Processing. 2008; 16(5): 1015–1028. doi: 10.1109/TASL.2008.925883

[25]Roy SK, Paliwal KK. Robustness and sensitivity metrics-based tuning of the augmented Kalman filter for single-channel speech enhancement. Applied Acoustics. 2022; 185: 108355. doi: 10.1016/j.apacoust.2021.108355

[26]Healy EW, Yoho SE, Wang Y, et al. An algorithm to improve speech recognition in noise for hearing-impaired listeners. The Journal of the Acoustical Society of America. 2013; 134(4): 3029–3038. doi: 10.1121/1.4820893

[27]Narayanan A, Wang D. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; 26–31 May 2013; Vancouver, BC, Canada. pp. 7092–7096. doi: 10.1109/ICASSP.2013.6639038

[28]Maganti H, Matassoni M. Bio-Inspired Auditory Processing for Speech Feature Enhancement. In: Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing; 26–29 January 2011; Rome, Italy. pp. 51–58. doi: 10.5220/0003145800510058

[29]Wang D, Chen J. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(10): 1702–1726. doi: 10.1109/TASLP.2018.2842159

[30]Yechuri S, Vanabathina SD. Speech Enhancement: A Review of Different Deep Learning Methods. International Journal of Image and Graphics. 2025; 25(3): 2550024. doi: 10.1142/S021946782550024X

[31]Tan K, Zhang X, Wang D. Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021; 29: 1853–1863. doi: 10.1109/TASLP.2021.3082318

[32]Elliott SJ, Nelson PA. Active noise control. IEEE Signal Processing Magazine. 1993; 10(4): 12–35. doi: 10.1109/79.248551

[33]Lucero JC, Munhall KG. A model of facial biomechanics for speech production. The Journal of the Acoustical Society of America. 1999; 106(5): 2834–2842. doi: 10.1121/1.428108

[34]Fu Y, Wang X. Advancements and trends in vehicle sound package for noise control: A comprehensive review. Advances in Mechanical Engineering. 2025; 17(6): 16878132251345867. doi: 10.1177/16878132251345867

[35]Scalart P, Filho JV. Speech enhancement based on a priori signal to noise estimation. In: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings; 7–10 May 1996; Atlanta, GA, USA. pp. 629–632. doi: 10.1109/ICASSP.1996.543199

[36]Cohen I. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing. 2003; 11(5): 466–475. doi: 10.1109/TSA.2003.811544

[37]Guðnason J, Fang G, Brookes M. Epoch-Based Spectrum Estimation for Speech. In: Proceedings of the Interspeech 2023; 20–24 August 2023; Dublin, Ireland. pp. 4274–4278. doi: 10.21437/Interspeech.2023-407

[38]Shankar N, Bhat GS, Panahi IMS. Real-Time Single-Channel Deep Neural Network-Based Speech Enhancement on Edge Devices. In: Proceedings of the Interspeech 2020; 25 October 2020; Shanghai, China. pp. 3281–3285. doi: 10.21437/Interspeech.2020-1901

[39]Lu X, Chen H, He X. A Frequency Domain Fitting Algorithm Method for Automotive Suspension Structure under Colored Noise. World Electric Vehicle Journal. 2024; 15(9): 410. doi: 10.3390/wevj15090410

[40]Hou J, Yi H, Xiang X, et al. Identification of vehicle suspension shock absorber rattle noise based on wavelet packet feature fusion and GWO-LSTM. Sound & Vibration. 2025; 59(2): 1941. doi: 10.59400/sv1941

[41]Green T, Hilkhuysen G, Huckvale M, et al. Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement. Trends in Hearing. 2022; 26: 23312165211068629. doi: 10.1177/23312165211068629

[42]Ganapathy S, Thomas S, Hermansky H. Modulation frequency features for phoneme recognition in noisy speech. The Journal of the Acoustical Society of America. 2009; 125(1): EL8–EL12. doi: 10.1121/1.3040022

[43]Allen JB, Rabiner LR. A unified approach to short-time Fourier analysis and synthesis. Proceedings of the IEEE. 1977; 65(11): 1558–1564. doi: 10.1109/PROC.1977.10770

[44]Gupta P, Patil HA, Guido RC. Vulnerability issues in Automatic Speaker Verification (ASV) systems. EURASIP Journal on Audio, Speech, and Music Processing. 2024; 2024(1): 10. doi: 10.1186/s13636-024-00328-8

Editor-in-Chief

Prof. Jun Yang

Institute of Acoustics, Chinese Academy of Sciences, China

ISSN

1541-0161 (Print)

2693-1443 (Online)

Publication Frequency

Bi-monthly

Indexing

Web of Science Coverage

Emerging Sources Citation Index (IF 4.2, Q1)

Elsevier Solutions

Scopus (2025 CiteScore 2.0);

Portico, etc.

About the Publisher

Academic Publishing insists on taking academic exchange and publication as the main line, carrying out comprehensive management based on science and technology, and fully exploring excellent international publishing resources. Within 5 years, it will form a strategic framework and scale with science (S), technology (T), medicine (M), education (E), and humanities and arts (H) as the main publishing fields. Academic Publishing is headquartered in Singapore and based in Malaysia, with the United States and China providing the main scientific and academic resources. At the same time, it has established long-term good cooperative relations with other publishing companies, scientific research communities, and academic organizations in more than a dozen countries and regions. Academic Publishing uses English and Chinese as its main publishing languages, mainly publishing books, journals, and conference papers in print and online. The vast majority of publications follow the international open access policy, providing stable and long-term quality and professional publications. With the joint efforts of the expert team and our professional editorial team, our publications will gradually be indexed by international databases in stages to provide convenient and professional retrieval for various scholars. At the same time, manuscripts we accept will be subject to the peer review principle, and cutting-edge and innovative research articles will be preferentially accepted for peer reference and discussion. All kinds of our publications are welcome for peer to contribute, access, and download.

more

Member of ASC

Volume Arrangement

Featured Articles

New scaling of critical damping and reduced frequency for mechanically excited systems

This paper introduces a universal framework for understanding the vibration responses of systems subjected to harmonic excitation. By examining a simplified cylinder-spring-damper model, the study refurbishes traditional scaling methods for the excitation frequency ratio and critical damping ratio. The findings indicate that in damped systems, the maximum amplitude of vibration does not align with the natural frequency. This observation leads to the introduction of a new scaling method for reduced frequency. This new approach aligns resonance peaks at the new reduced velocity of 1.0 across different damping ratios, providing a consistent characterization of vibration behavior. A new critical damping ratio of 0.707 is identified for an excited system as opposed to the traditional damping ratio of 1.0 for an unexcited system. Key properties such as maximum amplitude, phase lag, bandwidth, and quality factor are analyzed, demonstrating that the proposed reduced frequency and critical damping ratio effectively capture the dynamics of both damped and undamped excited systems. The findings offer significant insights for practical applications in engineering and various scientific fields.

Ultrasonic wave velocity as a universal metric for defect detection in timber structures: A case study on Japanese cedar wood (Cryptomeria japonica)

This study makes significant contributions to the field of ultrasonic testing (UT) by offering a novel approach to the identification of artificially introduced defects within Japanese cedar wood (Cryptomeria japonica). The findings are of particular relevance for the heritage conservation and construction sectors, where non-invasive defect detection is paramount. The study establishes a robust framework for assessing the structural integrity of timber by correlating ultrasonic wave velocity reductions with defect size and distribution. Big-sized defects led to more substantial decreases in wave velocity. The study establishes a robust framework for assessing the structural integrity of historical timber by correlating ultrasonic wave velocity reductions with defect size and distribution. This framework has the potential to be applicable to diverse wood species and defect types.

Vehicle structural road noise prediction based on an improved Long Short-Term Memory method

The control of vehicle interior noise has become a critical metric for assessing noise, vibration, and harshness (NVH) in vehicles. During the initial phases of vehicle development, accurately predicting the impact of road noise on interior noise is essential for reducing noise levels and expediting the product development cycle. In recent years, data-driven methods based on machine learning have gained significant attention due to their robust capability in navigating complex data mapping relationships. Notably, surrogate models have demonstrated exceptional performance in this domain. Numerous researchers have integrated diverse intelligent algorithms into the study of vehicle noise, leveraging advantages such as the elimination of precise modeling requirements, extensive solution space exploration, continuous learning from data, and robust algorithmic versatility. However, in NVH engineering applications, data-driven models face inherent limitations, particularly in interpretability and stability. To address these issues, this paper introduces an improved Long Short-Term Memory (LSTM) network that combines knowledge and data. Inspired by the physical information neural network concept, this approach incorporates values calculated through empirical formulas into the neural network as constraints. Comparative assessments with traditional LSTM networks highlight the advantages of this deep learning model. By integrating empirical formulas constraints, the model not only enhances interpretability but also achieves robust generalization with fewer data samples. The proposed method is validated on a specific vehicle model, showing significant improvements in prediction accuracy and efficiency.