Pertanika Journal

Home / Regular Issue / JST Vol. 34 (2) Apr. 2026 / JST-6129-2025

Improved Speech Enhancement using Parallel MVDR Beamforming and Coherent-to-Diffuse Power Ratio Post-filtering

Sureshkumar Natarajan, Mohd Khair Hassan, Raja Kamil, Faisul Arif Ahmad, Syaril Azrad, June Francis Macleans , Hussna Elnoor M. Abdalla , Nurbek Saparkhojayev, and Syed Abdul Rahman Al-Haddad

Pertanika Journal of Science & Technology, Volume 34, Issue 2, April 2026

DOI: https://doi.org/10.47836/pjst.34.2.15

Keywords: Acoustics, CDR, dereverberation, MVDR beamformer, noise suppression, speech enhancement

Published on: 2026-04-30

Abstract

Speech communication involves the exchange of information between two or more individuals; however, background noise and reverberation often degrade speech clarity and intelligibility. The impact of these degradations depends on factors such as the number, intensity, and spatial characteristics of noise sources, as well as reflections from surrounding surfaces. These effects pose significant challenges for applications including teleconferencing, hearing-aid devices, and human-machine voice interfaces. This study aims to address the combined effects of background noise and reverberation by proposing a speech enhancement framework that integrates Minimum Variance Distortionless Response (MVDR) beamforming with a Coherent-to-Diffuse Power Ratio (CDR) based post-filter. The methodology relies on parallel processing, where MVDR beamforming performs spatial noise suppression, while CDR values are estimated from microphone-domain signals and used to compute post-filter gains for suppressing residual diffuse noise and reverberation in the beamformer output. The proposed algorithm was evaluated over an input signal-to-noise ratio range from 0 dB to 40 dB and compared against the Integrated Sidelobe Cancellation Linear Prediction (ISCLP) baseline using four objective metrics: Perceptual Evaluation of Speech Quality (PESQ), Extended Short-Time Objective Intelligibility (ESTOI), Cepstral Distance (CD), and Weighted Spectral Slope Distance (WSS). The results demonstrate consistent improvements in PESQ and reductions in CD and WSS across all tested conditions, while gains in ESTOI remain modest. These findings indicate improved spectral fidelity and speech naturalness, highlighting the practical relevance of the proposed framework for real-time, low-complexity speech enhancement in noise and reverberation-prone communication systems.

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST-6129-2025