Methods and techniques for speaker recognition: A Review

Section: Review Paper
Published
Sep 1, 2024
Pages
32-44

Abstract

An identity verification and identification system based on a person's distinctive vocal characteristics is known as speaker recognition. This paper sheds light on the evolution of speaker recognition systems from the earliest days of computers to the most recent innovations. Voice represents the behavior biometric that communicates details about a person's features, ranging from the speaker's age, gender, and ethnicity. The field of speaker recognition focuses on identifying individuals by their voices. Even though speaker recognition has been the subject of research for the past eight decades. Applications such as the Internet of Things (IoT), smart homes, and smart gadgets have made their use fashionable in the modern era. The speaker recognition field is briefly discussed in this work with an outline of its modeling methodology and various feature extraction strategies across multiple languages. The aim of this speaker recognition literature is to advance academic knowledge of speaker recognition.

References

  1. DOI: 10.1109/ICASSP40776.9054440.
  2. A. D. Yarmey, M. J. Yarmey, and L. Todd, Frances McGehee (1912--2004): The first earwitness researcher, Percept. Mot. Skills, vol. 106, no. 2, pp. 387394, 2008.DOI:10.2466/pms.106.2.387-394
  3. A. H. Abdulqader, S. A. Al-Haddad, S. Abdo, A. Abdulghani, and S. Natarajan, Hybrid Feature Extraction MFCC and Feature Selection CNN for Speaker Identification Using CNN: A Comparative Study, in 2022 2nd International Conference on Emerging Smart Technologies and Applications (eSmarTA), 2022, pp. 16. DOI: 10.1109/eSmarTA56775.2022.9935422
  4. A. M. Sharma, Speaker recognition using machine learning techniques, M.S. thesis, Comp. Sci. Dept., San Jos State University, California, USA 2019. DOI: 10.31979/etd.fhhr-49pm
  5. A. Oppenheim and R. Schafer, Homomorphic analysis of Speech, IEEE Transactions on Audio and Electroacoustics, vol. 16, no. 2, pp. 221226, Jun. 1968. doi:10.1109/tau.1968.1161965.DOI: 10.1109/TAU.1968.1161965
  6. A. Poritz, Linear predictive hidden Markov models and the speech signal, in ICASSP82. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1982, vol. 7, pp. 12911294. DOI:10.1109/ICASSP.1982.1171633
  7. A. Preti, J.-F. Bonastre, D. Matrouf, F. Capman, and B. Ravera, Confidence measure based unsupervised target model adaptation for speaker verification, in Eighth Annual Conference of the International Speech Communication Association, 2007. DOI:10.21437
  8. A. S. L. Ferrer and S. K. E. S. A. Venkataraman, MLLR Transforms as Features in Speaker Recognition, 2006. Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA. ID: 5180980
  9. A. S. Ponraj and others, Speech Recognition with Gender Identification and Speaker Diarization, in 2020 IEEE International Conference for Innovation in Technology (INOCON), 2020, pp. 14.DOI: 10.1109/INOCON50539.2020.9298241
  10. A. V Oppenheim, Speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am., vol. 45, no. 2, pp. 458465, 1969. DOI: 10.1121/1.1911395
  11. A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, Phoneme recognition using time-delay neural networks, IEEE Trans. Acoust., vol. 37, no. 3, pp. 328339, 1989.DOI:10.1016/b978-0-08-051584-7.50037-1
  12. B. P. Bogert, The Quefrency Alanysis of the Time Series for Echos: Cepstrum Pseudo-Auto- Covariance, CrossCepstrum, and Saphe Cracking, Math Comput., vol. 19, pp. 209243, 1963. DOI:10.4236/jcc.2014.22012
  13. B. P. Das and R. Parekh, Recognition of isolated words using features based on LPC, MFCC, ZCR and STE, with neural network classifiers, International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.3, May-June 2012, pp. 854-858 ISSN: 2249-6645
  14. B. S. Atal and S. L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am., vol. 50, no. 2B, pp. 637655,1971. DOI: 10.1121/1.1912679
  15. B. S. Atal, Automatic recognition of speakers from their voices, Proc. IEEE, vol. 64, no. 4, pp. 460475, 1976. DOI: 10.1109/PROC.1976.10155
  16. B. S. Atal, Automatic speaker recognition based on pitch contours, J. Acoust. Soc. Am., vol. 52, no. 6B, pp. 16871697, 1972. DOI: 10.1121/1.1913303.
  17. B. S. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am., vol. 55, no. 6, pp. 13041312, 1974. DOI: 10.1121/1.1914702
  18. B. Singh, R. Kaur, N. Devgun, and R. Kaur, The process of feature extraction in automatic speech recognition system for computer machine interaction with humans: a review, Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 2, no. 2, pp. 17, 2012.
  19. C. Cortes and V. Vapnik, Support-vector networks, 1995, Mach. Learn., vol. 20, no. 3, p. 273, 1995. DOI: 10.1007/BF00994018
  20. C.-P. Chen, S.-Y. Zhang, C.-T. Yeh, J.-C. Wang, T. Wang, and C.-L. Huang, Speaker characterization using tdnn-lstm based speaker embedding, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 62116215.DOI: 10.1109/ICASSP.2019.8683185.
  21. GROZDI, S Jovii, Z ARI, I Suboti, Comparison of GMM/UBM and i-vector based speaker recognition systems, Sp 5th International Conference on Fundamental and Applied Aspects of Speech and Language Belgrade17-18 October, 2015
  22. SPEECH Lang. 2015, p. 274, 2015.
  23. d Z. Arsalane, Gammatone frequency cepstral coefficients for speaker identification over VoIP networks, in 2016 International Conference on Information Technology for Organizations Development (IT4OD), 2016, pp. 15. DOI: 10.1109/IT4OD.2016.7479293
  24. D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech Commun., vol. 17, no. 12, pp. 91108, 1995. DOI: 10.1016/0167-6393(95)00009.
  25. D. A. Reynolds, Ph.D. dissertation, "A Gaussian mixture modeling approach to text-independent speaker identification". Georgia Institute of Technology, 1992.
  26. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, Speaker verification using adapted Gaussian mixture models, Digit. Signal Process., vol. 10, no. 13, pp. 1941, 2000.DOI: 10.1006/dspr.1999.0361
  27. F. Bimbot et al., A tutorial on text-independent speaker verification, EURASIP J. Adv. Signal Process., vol. 2004, pp. 122, 2004.DOI: 10.1155/s1110865704310024
  28. F. K. Soong, A. E. Rosenberg, B.-H. Juang, and L. R. Rabiner, Report: A vector quantization approach to speaker recognition, AT&T Tech. J., vol. 66, no. 2, pp. 1426, 1987. DOI: 10.1002/j.1538-7305.1987.tb00198.x
  29. F. McGehee, An Experimental Study Voice Recognition, J. Gen. Psychol., vol. 31, pp. 5365,1944. DOI:10.1080/00221309.1944.10545219
  30. F. McGehee, The reliability of the identification of the human voice, J. Gen. Psychol., vol. 17, no. 2, pp. 249271, 1937.DOI: 10.1080/00221309.1937.9917999
  31. G. Cabadaug and . Karal, A Comparative Study of FFT Based Frequency Estimation Using Different Interpolation Techniques, Al-Rafidain Eng. J., vol. 28, no. 2, pp. 8693, 2023.DOI: 10.33899/rengj.2023.139624.1250
  32. G. Costantini, V. Cesarini, and E. Brenna, High-Level CNN and Machine Learning Methods for Speaker Recognition, Sensors, vol. 23, no. 7, p. 3461, 2023.DOI: 10.3390/s23073461
  33. G. Doddington, Speaker recognition based on idiolectal differences between speakers, in Seventh European Conference on Speech Communication and Technology, 2001. (Eurospeech 2001), pp. 2521-2524,
  34. DOI 10.21437/Eurospeech.2001-417
  35. G. Fant, Acoustic theory of speech production, Natur Mag., 1960. Doi.org/10.1007/978-94-011-4657-9_3
  36. G. R. Doddington, Speaker recognition Identifying people by their voices, Proc. IEEE, vol. 73, no. 11, pp. 16511664, 1985.DOI: 10.1109/PROC.1985.13345
  37. H. Gish, M.-H. Siu, and J. R. Rohlicek, Segregation of speakers for speech recognition and speaker identification., in icassp, 1991, vol. 91, pp. 873876. DOI: 10.1109/ICASSP.1991.150477
  38. H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am., vol. 87, no. 4, pp. 17381752, 1990. DOI: 0.1121/1.399423.
  39. H. Rahali, Z. Hajaiej, and N. Ellouze, Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis, Int. J. Electron. Commun. Eng., vol. 8, no. 9, pp. 15861591, 2014. DOI.org/10.5281/zenodo.1095933
  40. J. Bradbury, Linear predictive coding, Mc G. Hill, 2000.
  41. J. E. W. Koh et al., Diagnosis of retinal health in digital fundus images using continuous wavelet transform (CWT) and entropies, Comput. Biol. Med., vol. 84, pp. 8997, 2017. DOI: 10.1016/j.compbiomed.2017.03.008
  42. J. Markel, B. Oshika, and A. Gray, Long-term feature averaging for speaker recognition, IEEE Trans. Acoust., vol. 25, no. 4, pp. 330337,August 1977 . DOI: 0.1109/TASSP.1977.1162961
  43. J. W. Cooley and J. W. Tukey, An algorithm for the machine computation of complex Fourier series, vol. 19, Math. Comput., p. 73, 1965.
  44. K. A. Kamiski and A. P. Dobrowolski, Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features, Sensors, vol. 22, no. 23, p. 9370, 2022.DOI: 10.3390/s22239370
  45. K. P. Li, J. E. Dammann, and W. D. Chapman, Experimental studies in speaker verification, using an adaptive system, J. Acoust. Soc. Am., vol. 40, no. 5, pp. 966978, 1966. DOI: 10.1121/1.1910221
  46. K. T. Al-Sarayreh, R. E. Al-Qutaish, and B. M. Al-Kasasbeh, Using the sound recognition techniques to reduce the electricity consumption in highways, J. Am. Sci., vol. 5, no. 2, pp. 112, 2009. DOI:10.7537/marsjas050209.01
  47. K.-P. Li and G. W. Hughes, Talker differences as they appear in correlation matrices of continuous speech spectra, J. Acoust. Soc. Am., vol. 55, no. 4, pp. 833837, 1974. DOI: 10.1121/1.1914608
  48. L. Eljawad et al., Arabic voice recognition using fuzzy logic and neural network, Eljawad, L., Aljamaeen, R., Alsmadi, MK, Al-Marashdeh, I., Abouelmagd, H., Alsmadi, S., Haddad, F., Alkhasawneh, RA, Alzughoul, M. & Alazzam, MB, pp. 651662, 2019. DOI: org/10.1063/5.0094741
  49. L. G. Kersta, Voiceprint identification, J. Acoust. Soc. Am., vol. 34, no. 5, p. 725, 1962.
  50. L. Gbadamosi, Voice Recognition System Using Template Matching, Int. J. Res. Comput. Sci., vol. 3, no. 5, p. 13, 2013. DOI: 10.7815/ijorcs. 35.2013.070.
  51. L. Gong, S. Xie, Y. Zhang, Y. Xiong, X. Wang, and J. Li, A Robust Feature Extraction Method for Sound Signals Based on Gabor and MFCC, in 2022 6th International Conference on Communication and Information Systems (ICCIS), 2022, pp. 4955. DOI: 10.1109/ICCIS56375.2022.9998146
  52. L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989. DOI: 10.1109/5.18626
  53. L. Zhu and Q. Yang, Speaker Recognition System Based on weighted feature parameter, Phys. Procedia, vol. 25, pp. 15151522, 2012.DOI: 10.1016/j.phpro.2012.03.270
  54. M. A. Al-yoonus and S. A. Al-Kazzaz, FPGA-SoC Based Object Tracking Algorithms: A Literature Review, Al-Rafidain Eng. J., vol. 28, no. 2, pp. 284295, 2023. DOI:10.33899/rengj.2023.138936.1243.
  55. M. A. Al-Zakarya and Y. F. Al-Irhaim, Unsupervised and Semi-Supervised Speech Recognition System: A Review, AL-Rafidain J. Comput. Sci. Math., vol. 17, no. 1, pp. 3442, 2023. DOI:10.33899/csmj.2023.179466
  56. M. A. Islam, Y. Xu, T. Monk, S. Afshar, and A. van Schaik, Noise-robust text-dependent speaker identification using cochlear models, J. Acoust. Soc. Am., vol. 151, no. 1, pp. 500516, 2022.DOI: 10.1121/10.0009314
  57. M. Alsulaiman, A. Mahmood, and G. Muhammad, Speaker recognition based on Arabic phonemes, Speech Commun., vol. 86, DOI:10.1016/j.specom.2016.11.004.
  58. M. Jin and C. D. Yoo, Speaker verification and identification, in Behavioral Biometrics for Human Identification: Intelligent Applications, IGI Global, 2010, pp. 264289.DOI: 10.4018/978-1-60566-725-6.ch013
  59. M. Kotti, E. Benetos, and C. Kotropoulos, Computationally efficient and robust BIC-based speaker segmentation, IEEE Trans. Audio. Speech. Lang. Processing, vol. 16, no. 5, pp. 920933,2008. DOI:10.1109/TASL.2008.925152
  60. M. M. El Choubassi, H. E. El Khoury, C. E. J. Alagha, J. A. Skaf, and M. A. Al-Alaoui, Arabic speech recognition using recurrent neural networks, in Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No. 03EX795), 2003, pp. 543547. DOI: 10.1109/ISSPIT.2003.1341178
  61. M. McLaren, L. Ferrer, and A. Lawson, Exploring the role of phonetic bottleneck features for speaker and language recognition, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 55755579. DOI: 10.1109/ICASSP.2016.7472744.
  62. M. McLaren, R. Vogt, B. Baker, and S. Sridharan, A comparison of session variability compensation techniques for SVM-based speaker recognition, in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007), 2007, pp. 790793.
  63. M. McLaren, Y. Lei, and L. Ferrer, Advances in deep neural network approaches to speaker recognition, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, pp. 48144818. DOI: 10.1109/ICASSP.2015.7178885
  64. M. Wolf, Channel selection and reverberation-robust automatic speech recognition, PhD Thesis, Dept. of Signal Theory and Communication, Universitat Polit`ecnica de Catalunya, Barcelona, Spain, 2013. DOI: 10.5821/dissertation-2117-95257
  65. M.-C. Cheung, M.-W. Mak, and S.-Y. Kung, A two-level fusion approach to multimodal biometric verification, in Proceedings. (ICASSP05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005, vol. 5, pp. v--485. DOI: 10.1109/ICASSP.2005.1416346.
  66. M.-H. Siu, G. Yu, and H. Gish, An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers, in Acoustics, Speech, and Signal Processing, IEEE International Conference on, 1992, vol. 2, pp. 189192.DOI Bookmark: 10.1109/ICASSP.1992.226088
  67. N. Dehak, P. Dumouchel, and P. Kenny, Modeling prosodic features with joint factor analysis for speaker verification, IEEE Trans. Audio. Speech. Lang. Processing, vol. 15, no. 7, pp. 20952103, 2007.DOI: 10.1109/TASL.2007.902758
  68. N. K. Kaphungkui and A. B. Kandali, Text Dependent Speaker Recognition with Back Propagation Neural Network, Int. J. Eng. Adv. Technol., vol. 8, no. 5, pp. 14311434, 2019. ISBN:0-7803-9313-9
  69. N. Kanthi, Speaker Identification based on GFCC using GMM, Int. J. Innov. Res. Adv. Eng., vol. 1, no. 8, pp. 224232, 2014. ISSN: 2277 128X.
  70. N. M. Almarshady, A. A. Alashban, and Y. A. Alotaibi, Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset, Appl. Sci., vol. 13, no. 17, p. 9567, 2023. DOI: 10.3390/app13179567
  71. N. Najkar, F. Razzazi, and H. Sameti, A novel approach to HMM-based speech recognition systems using particle swarm optimization, Math. Comput. Model., vol. 52, no. 1112, pp. 19101920, 2010. DOI: 10.1016/j.mcm.2010.03.041
  72. N. Singh, Voice biometric: revolution in field of security, CSI Commun., vol. 43, no. 8, pp. 2425, 2019. ISs N0970-647X
  73. N. Singh, A. Agrawal, and R. Khan, The development of speaker recognition technology, IJARET., Vol. 9, Issue: 3, pp. 816, May June 2018.
  74. N. Singh, N. Bhendawade, and H. A. Patil, Novel cochlear filter based cepstral coefficients for classification of unvoiced fricatives, Int. J. Nat. Lang. Comput, vol. 3, no. 4, pp. 2140, 2014. DOI : 10.5121/ijnlc.2014.3402
  75. P. D. Bricker et al., Statistical techniques for talker identification, Bell Syst. Tech. J., vol. 50, no. 4, pp. 14271454, 1971. DOI:10.1002/j.1538-7305.1971.tb02561.x
  76. P. Gimeno, D. Ribas, A. Ortega, A. Miguel, and E. Lleida, Unsupervised adaptation of deep speech activity detection models to unseen domains, Appl. Sci., vol. 12, no. 4, p. 1832, 2022. DOI:10.3390/app12041832
  77. P. K. Kurzekar, R. R. Deshmukh, V. B. Waghmare, and P. P. Shrishrimal, A comparative study of feature extraction techniques for speech recognition system, Int. J. Innov. Res. Sci. Eng. Technol., vol. 3, no. 12, pp. 1800618016, 2014.
  78. P. K. Sharma et al., Eminent method of voice identification by applying pitch, intensity and pulse, in AIP Conference Proceedings, 2022, vol. 2393, no. 1. DOI: 10.1063/5.0074174
  79. P. Krl, Discrete Wavelet Transform for automatic speaker recognition, in 2010 3rd International Congress on Image and Signal Processing, 2010, vol. 7, pp. 35143518. DOI: 10.1109/CISP.2010.5646691
  80. P. Krishnamoorthy, H. S. Jayanna, and S. R. M. Prasanna, Speaker recognition under limited data condition by noise addition, Expert Syst. Appl., vol. 38, no. 10, pp. 1348713490, 2011.DOI: 10.1016/j.eswa.2011.04.069
  81. Q. Li and Y. Huang, An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions, IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 6, pp. 17911801, 2010. DOI: 10.1109/TASL.2010.2101594
  82. Q.-B. Hong, C.-H. Wu, H.-M. Wang, and C.-L. Huang, Statistics pooling time delay neural network based on x-vector for speaker verification, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 68496853.DOI: 10.1109/ICASSP40776.2020.9054350
  83. R. C. Rose and D. A. Reynolds, Text independent speaker identification using automatic acoustic segmentation, in International Conference on Acoustics, Speech, and Signal Processing, 1990, pp. 293296. DOI: 10.1109/ICASSP.1990.115638
  84. R. Fraud and F. Clrot, A methodology to explain neural network classification, Neural networks, vol. 15, no. 2, pp. 237246, 2002. DOI:10.1016/S0893-6080(01)00127-7
  85. R. M. Hanifa, K. Isa, and S. Mohamad, A review on speaker recognition: Technology and challenges, Comput. & Electr. Eng., vol. 90, p. 107005, 2021.DOI: 10.1016/j.compeleceng.2021.107005
  86. R. Potter, G. Kopp, and H. Green, Technical Aspects of Visual Speech, Bell Labs, New York, 1947.
  87. R. W. Schafer and L. R. Rabiner, Digital Representation of Speech, Invit. Pap. Proc. IEEE, vol. 63, p. 4, 1975.DOI: 10.1109/PROC.1975.9799
  88. S. Bhardwaj, S. Srivastava, M. Hanmandlu, and J. R. P. Gupta, GFM-based methods for speaker identification, IEEE Trans. Cybern., vol. 43, no. 3, pp. 10471058, 2013. DOI: 10.1109/TSMCB.2012.2223461
  89. S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., vol. 27, no. 2, pp. 113120, 1979. DOI: 10.1109/TASSP.1979.1163209
  90. S. Furui, 50 years of progress in speech and speaker recognition research, ECTI Trans. Comput. Inf. Technol., vol. 1, no. 2, pp. 6474, 2005. DOI:/10.37936/ecti-cit.200512.51834
  91. S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, Applications of support vector machine (SVM) learning in cancer genomics, Cancer genomics & proteomics, vol. 15, no. 1, pp. 4151, 2018. DOI:10.21873/cgp.20063
  92. S. J. Kuntz and J. B. Rawlings, Maximum likelihood estimation of linear disturbance models for offset-free model predictive control, in 2022 American Control Conference (ACC), 2022, pp. 39613966. DOI: 10.23919/ACC53348.2022.9867344
  93. S. M. Omer, J. A. Qadir, and Z. K. Abdul, Uttered Kurdish digit recognition system, J. Univ. Raparin, vol. 6, no. 2, pp. 7885, 2019.DOI: 10.26750/paper
  94. S. Mallat, A wavelet tour of signal processing. Elsevier, 1999. Academic Press 84 Theobald Road, London WCIX 8RR, UK. ISBN-13: 978-0-12-466606-1.
  95. S. Mizuta and K. Nakajima, A discriminative training method for continuous mixture density HMMs and its implementation to recognize noisy speech, J. Acoust. Soc. Japan, vol. 13, no. 6, pp. 389393, 1992. DOI: 10.1250/ast.13.389
  96. S. Nakagawa and H. Suzuki, A new speech recognition method based on VQ-distortion measure and HMM, in 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, vol. 2, pp. 676679. DOI: 10.1109/ICASSP.1993.319401
  97. S. Pruzansky and M. V Mathews, Talker-recognition procedure based on analysis of variance, J. Acoust. Soc. Am., vol. 36, no. 11, pp. 20412047, 1964. DOI: 10.1121/1.1919320
  98. S. Sujiya and E. Chandra, A review on speaker recognition, Int J Eng Technol, vol. 9, no. 3, pp. 15921598, 2017, DOI:10.21817/ijet/2017/v9i3/170903513.
  99. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, Cbam: Convolutional block attention module, in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 319. DOI:/10.1007/978-3-030-01234-2_1
  100. S. Yadav and A. Rai, Frequency and temporal convolutional attention for text-independent speaker recognition, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020,
  101. DOI: 10.1109/ICASSP40776.9054440.
  102. T. F. Zheng and L. Li, "Robustness-related issues in speaker recognition," vol. 2. Springer, 2017.DOI:10.1007/978-981-10-3238-7
  103. T. THAT, Automatic speaker recognition using Gaussian mixture speaker models, Lincoln Lab. J., vol. 8, no. 2, 1995. Doi: 10.1121/1.2027823
  104. U. Kumaran, S. Radha Rammohan, S. M. Nagarajan, and A. Prathik, Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN, Int. J. Speech Technol., vol. 24, pp. 303314, 2021. DOI: 10.1007/s10772-020-09792-x
  105. V. Vapnik, Estimation of dependences based on empirical data. Springer Science & Business Media, 2006. DOI: 10.1007/978-0-387-34239-9
  106. V. Vestman, T. Kinnunen, R. G. Hautamki, and M. Sahidullah, Voice mimicry attacks assisted by automatic speaker verification, Comput. Speech & Lang., vol. 59, pp. 3654, 2020.DOI: 10.1016/j.csl.2019.05.005
  107. W. Endress, W. Bambach, and G. Flosser, Voice spectrograms as a function of age, Voice Disguise Voice Imitation, JASA, vol. 49, no. 6, p. 2, 1971. DOI: 10.1121/1.1912589
  108. W. Jiang, P. Liu, and F. Wen, Speech magnitude spectrum reconstruction from MFCCs using deep neural network, Chinese J. Electron., vol. 27, no. 2, pp. 393398, 2018. DOI: 10.1049/cje.2017.09.018
  109. W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, Support vector machines for speaker and language recognition, Comput. Speech & Lang., vol. 20, no. 23, pp. 210229, 2006. DOI: 10.1016/j.csl.2005.06.003
  110. W. M. Campbell, J. R. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek, High-level speaker verification with support vector machines, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 1, pp. 1-73.DOI: 10.1109/ICASSP.2004.1325925
  111. X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals, Speaker diarization: A review of recent research, IEEE Trans. Audio. Speech. Lang. Processing, vol. 20, no. 2, pp. 356370, 2012. DOI: 10.1109/TASL.2011.2125954
  112. X. Qin, H. Bu, and M. Li, Hi-mia: A far-field text-dependent speaker verification database and the baselines, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 76097613.DOI: 10.1109/ICASSP40776.2020.9054423
  113. X. Qin, N. Li, C. Weng, D. Su, and M. Li, Simple attention module based speaker verification with iterative noisy label detection, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 67226726.DOI: 10.1109/ICASSP43922.2022.9746294
  114. X. Xie, X. Liu, H. Chen, and H. Wang, Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 15.DOI: 10.1109/ICASSP49357.2023.10095083
  115. Y. Liu, Y. Qian, N. Chen, T. Fu, Y. Zhang, and K. Yu, Deep feature for text-dependent speaker verification, Speech Commun., vol. 73, pp. 113, 2015. DOI: 10.1016/j.specom.2015.07.003
  116. Y. Lukic, C. Vogt, O. Drr, and T. Stadelmann, Speaker identification and clustering using convolutional neural networks, in 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), 2016, pp. 16. DOI: 10.1109/MLSP.2016.7738816
  117. Y. Zhang and L. Liu, Multi-task learning for X-vector based speaker recognition, Int. J. Speech Technol., pp. 17, 2023.DOI: 10.1007/s10772-023-10058-5
Download this PDF file

Statistics

How to Cite

[1]
A. Rasheed, عبدالعليم, M. T Yaseen, محمد, M. A. Abdulhameed, and مروان, “Methods and techniques for speaker recognition: A Review”, AREJ, vol. 29, no. 2, pp. 32–44, Sep. 2024.