Analyzing the MFCC and GFCC to Identify Reverberation Effects on The Sound

Abdalem Rasheed

doi:10.33899/arej.2024.146975.1336

Analyzing the MFCC and GFCC to Identify Reverberation Effects on The Sound

Section: Research Paper

Issue

Vol. 29 No. 2 (2024): Volume 29 Issue 2

Published

Sep 1, 2024

Pages

148-156

Abstract

The majority of acoustic signals contain additive reverberation noise, which degrades and distorts the reliability of the sound system and has detrimental effects on a variety of identification applications, including the speaker recognition field. This paper analyzed two techniques to mitigate and combat the impact of reverberation on sound and compared the performance of these methods. These techniques are Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC). The GFCC differs from the conventional MFCC in that it replaces the Mel filter bank with a Gamatone filter bank to increase durability.To avoid the effects of environmental sounds and different features of the speaker voice duo to the variable situation of the speaker such as illness and emotion, a single tone of 1 KHz was applied to obtain a fair and impartial comparison between the GFCC and MFCC methods of sound signal recognition. The comparison between the MFCC and GFCC features was accomplished by using PCA and corroborated by the normalized cross-correlation NCC. Reducing dimensions and removing correlation is the primary purpose of the PCA algorithm so that the features become orthogonalized. The PCA and NCC report that for both reverberant and non-reverberant single-tone recorded sound, there was a about 10% increase in the rate of detection and the variance increased by 11% for GFCC compared to MFCC features.Then this work shows that method uses GFCC features is stronger and superior against the reverberation noise than classic MFCC features. Therefore, the GFCC mitigates the reverberation effect and presents a good candidate for functionality in actual recognition systems. In addition, this work examines the potential outcomes of joining the MFCC and GFCC as feature components to obtain a more robust speaker recognition system. The imrovment in the obtained variance is demonstrated by the results to be roughly 30% greater than in the case of GFCC feature coefficients variance.

References

W. S. J. MOHN, STATISTICAL FEATURE EVALUTION IN SPEAKER IDENTIFICATION., 1970.
A. A. Rasheed, Intonation speech for text-dependent speaker verification, ICCSNIS2024, no. Sousse, TUNISIA, 2024, [Online]. Available: https://fti-tn.net/publications
D. Y. Mohammed, K. Al-Karawi, and A. Aljuboori, Robust speaker verification by combining MFCC and entrocy in noisy conditions, Bull. Electr. Eng. Informatics, vol. 10, no. 4, pp. 23102319, 2021. DOI: https://doi.org/10.11591/eei.v10i4.2957
B. K. Swain, M. Z. Khan, C. L. Chowdhary, and A. Alsaeedi, SRC: Superior Robustness of COVID-19 Detection from Noisy Cough Data Using GFCC., Comput. Syst. Sci. & Eng., vol. 46, no. 2, 2023. DOI: 10.32604/csse.2023.036192
K. A. Y. AL-Karawi, Robust speaker recognition in reverberant condition-toward greater biometric security. University of Salford (United Kingdom), 2018.
A. H. Al-Noori, K. A. Al-Karawi, and F. F. Li, Improving robustness of speaker recognition in noisy and reverberant conditions via training, in 2015 European Intelligence and Security Informatics Conference, 2015, p. 180. DOI: 10.1109/EISIC.2015.20
K. A. Al-Karawi and F. Li, Robust speaker verification in reverberant conditions using estimated acoustic parameters: A maximum likelihood estimation and training on the fly approach, in 2017 seventh international conference on innovative computing technology (INTECH), 2017, pp. 5257. DOI: 10.1109/INTECH.2017.8102427
K. A. Al-Karawi, Mitigate the reverberation effect on the speaker verification performance using different methods, Int. J. Speech Technol., vol. 24, no. 1, pp. 143153, 2021. https://doi.org/10.1007/s10772-020-09780-1
K. A. Al-Karawi and D. Y. Mohammed, Early reflection detection using autocorrelation to improve robustness of speaker verification in reverberant conditions, Int. J. Speech Technol., vol. 22, no. 4, pp. 10771084, 2019. https://doi.org/10.1007/s10772-019-09648-z
S. Huq, Differentiation of Dry and Wet Cough Sounds using A Deep Learning Model and Data Augmentation, Carleton University, 2023.
S. Gergen, A. Nagathil, and R. Martin, Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signals, in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
F. Bimbot et al., A tutorial on text-independent speaker verification, EURASIP J. Adv. Signal Process., vol. 2004, pp. 122, 2004. doi.org/10.1155/S1110865704310024
K. A. Al-Karawi and D. Y. Mohammed, Using combined features to improve speaker verification in the face of limited reverberant data, Int. J. Speech Technol., vol. 26, no. 3, pp. 789799, 2023. https://doi.org/10.1007/s10772-023-10048-7
X. Chen and S. A. Zahorian, Improving speaker verification in reverberant environments, in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 58545858. DOI: 10.1109/ICASSP39728.2021.9413731
T. Sun, Y. Wen, X. Zhang, B. Jia, and M. Zhou, Gaussian Mixture Model for Marine Reverberations, Appl. Sci., vol. 13, no. 21, p. 12063, 2023.
S. Ramoji, Supervised Learning Approaches for Language and Speaker Recognition, Indian Institute of Science Bangalore, 2023. doi.org/10.3390/app132112063
H. Taherian, Z.-Q. Wang, and D. Wang, Deep learning based multi-channel speaker recognition in noisy and reverberant environments, in Interspeech, 2019. doi: 10.21437.
P. M. Chauhan and N. P. Desai, Mel frequency cepstral coefficients (MFCC) based speaker identification in noisy environment using wiener filter, in 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014, pp. 15. DOI: 10.1109/ICGCCEE.2014.6921394
M. V. Sagvekar, M. Limkar, and B. R. Rao, Speaker Identification Using MEL Frequency Cepstral Coefficients and Vector Quatization, 2012.
J. Qi, D. Wang, J. Xu, and J. Tejedor, Bottleneck features based on gammatone frequency cepstral coefficients., in Interspeech, 2013, pp. 17511755.
W. Burgos, Gammatone and MFCC features in speaker recognition, 2014. DOI: 10.13140/RG.2.2.25142.29768
L. R. Rabiner and B.-H. Juang, Fundamentals of speech recognition. Tsinghua University Press, 1999.
J. P. Campbell, Speaker recognition: A tutorial, Proc. IEEE, vol. 85, no. 9, pp. 14371462, 1997. doi: 10.1109/5.628714
D. A. Reynolds, An overview of automatic speaker recognition technology, in 2002 IEEE international conference on acoustics, speech, and signal processing, 2002, pp. IV--4072. DOI:10.1109/ICASSP.2002.5745552
M. Kim, E. Kim, C. Seo, and S. Jeon, Speaker verification and identification using principal component analysis based on global eigenvector matrix, in Hybrid Artificial Intelligence Systems: 5th International Conference, HAIS 2010, San Sebastin, Spain, June 23-25, 2010. Proceedings, Part I 5, 2010, pp. 278285.
A. I. Ahmed, J. P. Chiverton, D. L. Ndzi, and V. M. Becerra, Speaker recognition using PCA-based feature transformation, Speech Commun., vol. 110, pp. 3346, 2019. https://doi.org/10.1016/j.specom.2019.04.001
T. Kinnunen and H. Li, An overview of text-independent speaker recognition: From features to supervectors, Speech Commun., vol. 52, no. 1, pp. 1240, 2010. https://doi.org/10.1016/j.specom.2009.08.009
T. Rossing, Springer handbook of acoustics. Springer Science & Business Media, 2007.
R. Petrick, K. Lohde, M. Wolff, and R. Hoffmann, The harming part of room acoustics in automatic speech recognition., in INTERSPEECH, 2007, pp. 10941097. doi: 10.21437/Interspeech.2007-112
K. A Al-Karawi, A. H Al-Noori, F. F. Li, and T. Ritchings, Automatic speaker recognition system in adverse conditions implication of noise and reverberation on system performance, Int. J. Inf. Electron. Eng., vol. 5, no. 6, 2015. doi.org/10.7763/IJIEE.2015.V5.571
M. Mohammadamini, Robustness of DNN-based speaker recognition systems against environmental variabilities, 2023.
DJordje GROZDI, S. Jovii, D. UMARAC PAVLOVI, J. Gali, and B. Markovi, Comparison of Cepstral Normalization Techniques in Whispered Speech Recognition., Adv. Electr. & Comput. Eng., vol. 17, no. 1, 2017. doi: 10.4316/AECE.2017.01004.

Authors

Abdalem Rasheed

Department of Electrical Engineering, College of Engineering, University of Mosul, Mosul, Iraq

Identifiers

https://doi.org/10.33899/arej.2024.146975.1336

Download this PDF file

PDF

Statistics

How to Cite

[1]

A. Rasheed and عبدالعليم, “Analyzing the MFCC and GFCC to Identify Reverberation Effects on The Sound”, AREJ, vol. 29, no. 2, pp. 148–156, Sep. 2024.