Methods and techniques for speaker recognition: A Review
Abstract
An identity verification and identification system based on a person's distinctive vocal characteristics is known as speaker recognition. This paper sheds light on the evolution of speaker recognition systems from the earliest days of computers to the most recent innovations. Voice represents the behavior biometric that communicates details about a person's features, ranging from the speaker's age, gender, and ethnicity. The field of speaker recognition focuses on identifying individuals by their voices. Even though speaker recognition has been the subject of research for the past eight decades. Applications such as the Internet of Things (IoT), smart homes, and smart gadgets have made their use fashionable in the modern era. The speaker recognition field is briefly discussed in this work with an outline of its modeling methodology and various feature extraction strategies across multiple languages. The aim of this speaker recognition literature is to advance academic knowledge of speaker recognition.
References
- DOI: 10.1109/ICASSP40776.9054440.
- A. D. Yarmey, M. J. Yarmey, and L. Todd, Frances McGehee (1912--2004): The first earwitness researcher, Percept. Mot. Skills, vol. 106, no. 2, pp. 387394, 2008.DOI:10.2466/pms.106.2.387-394
- A. H. Abdulqader, S. A. Al-Haddad, S. Abdo, A. Abdulghani, and S. Natarajan, Hybrid Feature Extraction MFCC and Feature Selection CNN for Speaker Identification Using CNN: A Comparative Study, in 2022 2nd International Conference on Emerging Smart Technologies and Applications (eSmarTA), 2022, pp. 16. DOI: 10.1109/eSmarTA56775.2022.9935422
- A. M. Sharma, Speaker recognition using machine learning techniques, M.S. thesis, Comp. Sci. Dept., San Jos State University, California, USA 2019. DOI: 10.31979/etd.fhhr-49pm
- A. Oppenheim and R. Schafer, Homomorphic analysis of Speech, IEEE Transactions on Audio and Electroacoustics, vol. 16, no. 2, pp. 221226, Jun. 1968. doi:10.1109/tau.1968.1161965.DOI: 10.1109/TAU.1968.1161965
- A. Poritz, Linear predictive hidden Markov models and the speech signal, in ICASSP82. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1982, vol. 7, pp. 12911294. DOI:10.1109/ICASSP.1982.1171633
- A. Preti, J.-F. Bonastre, D. Matrouf, F. Capman, and B. Ravera, Confidence measure based unsupervised target model adaptation for speaker verification, in Eighth Annual Conference of the International Speech Communication Association, 2007. DOI:10.21437
- A. S. L. Ferrer and S. K. E. S. A. Venkataraman, MLLR Transforms as Features in Speaker Recognition, 2006. Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA. ID: 5180980
- A. S. Ponraj and others, Speech Recognition with Gender Identification and Speaker Diarization, in 2020 IEEE International Conference for Innovation in Technology (INOCON), 2020, pp. 14.DOI: 10.1109/INOCON50539.2020.9298241
- A. V Oppenheim, Speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am., vol. 45, no. 2, pp. 458465, 1969. DOI: 10.1121/1.1911395
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, Phoneme recognition using time-delay neural networks, IEEE Trans. Acoust., vol. 37, no. 3, pp. 328339, 1989.DOI:10.1016/b978-0-08-051584-7.50037-1
- B. P. Bogert, The Quefrency Alanysis of the Time Series for Echos: Cepstrum Pseudo-Auto- Covariance, CrossCepstrum, and Saphe Cracking, Math Comput., vol. 19, pp. 209243, 1963. DOI:10.4236/jcc.2014.22012
- B. P. Das and R. Parekh, Recognition of isolated words using features based on LPC, MFCC, ZCR and STE, with neural network classifiers, International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.3, May-June 2012, pp. 854-858 ISSN: 2249-6645
- B. S. Atal and S. L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Am., vol. 50, no. 2B, pp. 637655,1971. DOI: 10.1121/1.1912679
- B. S. Atal, Automatic recognition of speakers from their voices, Proc. IEEE, vol. 64, no. 4, pp. 460475, 1976. DOI: 10.1109/PROC.1976.10155
- B. S. Atal, Automatic speaker recognition based on pitch contours, J. Acoust. Soc. Am., vol. 52, no. 6B, pp. 16871697, 1972. DOI: 10.1121/1.1913303.
- B. S. Atal, Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am., vol. 55, no. 6, pp. 13041312, 1974. DOI: 10.1121/1.1914702
- B. Singh, R. Kaur, N. Devgun, and R. Kaur, The process of feature extraction in automatic speech recognition system for computer machine interaction with humans: a review, Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 2, no. 2, pp. 17, 2012.
- C. Cortes and V. Vapnik, Support-vector networks, 1995, Mach. Learn., vol. 20, no. 3, p. 273, 1995. DOI: 10.1007/BF00994018
- C.-P. Chen, S.-Y. Zhang, C.-T. Yeh, J.-C. Wang, T. Wang, and C.-L. Huang, Speaker characterization using tdnn-lstm based speaker embedding, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 62116215.DOI: 10.1109/ICASSP.2019.8683185.
- GROZDI, S Jovii, Z ARI, I Suboti, Comparison of GMM/UBM and i-vector based speaker recognition systems, Sp 5th International Conference on Fundamental and Applied Aspects of Speech and Language Belgrade17-18 October, 2015
- SPEECH Lang. 2015, p. 274, 2015.
- d Z. Arsalane, Gammatone frequency cepstral coefficients for speaker identification over VoIP networks, in 2016 International Conference on Information Technology for Organizations Development (IT4OD), 2016, pp. 15. DOI: 10.1109/IT4OD.2016.7479293
- D. A. Reynolds, Speaker identification and verification using Gaussian mixture speaker models, Speech Commun., vol. 17, no. 12, pp. 91108, 1995. DOI: 10.1016/0167-6393(95)00009.
- D. A. Reynolds, Ph.D. dissertation, "A Gaussian mixture modeling approach to text-independent speaker identification". Georgia Institute of Technology, 1992.
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, Speaker verification using adapted Gaussian mixture models, Digit. Signal Process., vol. 10, no. 13, pp. 1941, 2000.DOI: 10.1006/dspr.1999.0361
- F. Bimbot et al., A tutorial on text-independent speaker verification, EURASIP J. Adv. Signal Process., vol. 2004, pp. 122, 2004.DOI: 10.1155/s1110865704310024
- F. K. Soong, A. E. Rosenberg, B.-H. Juang, and L. R. Rabiner, Report: A vector quantization approach to speaker recognition, AT&T Tech. J., vol. 66, no. 2, pp. 1426, 1987. DOI: 10.1002/j.1538-7305.1987.tb00198.x
- F. McGehee, An Experimental Study Voice Recognition, J. Gen. Psychol., vol. 31, pp. 5365,1944. DOI:10.1080/00221309.1944.10545219
- F. McGehee, The reliability of the identification of the human voice, J. Gen. Psychol., vol. 17, no. 2, pp. 249271, 1937.DOI: 10.1080/00221309.1937.9917999
- G. Cabadaug and . Karal, A Comparative Study of FFT Based Frequency Estimation Using Different Interpolation Techniques, Al-Rafidain Eng. J., vol. 28, no. 2, pp. 8693, 2023.DOI: 10.33899/rengj.2023.139624.1250
- G. Costantini, V. Cesarini, and E. Brenna, High-Level CNN and Machine Learning Methods for Speaker Recognition, Sensors, vol. 23, no. 7, p. 3461, 2023.DOI: 10.3390/s23073461
- G. Doddington, Speaker recognition based on idiolectal differences between speakers, in Seventh European Conference on Speech Communication and Technology, 2001. (Eurospeech 2001), pp. 2521-2524,
- DOI 10.21437/Eurospeech.2001-417
- G. Fant, Acoustic theory of speech production, Natur Mag., 1960. Doi.org/10.1007/978-94-011-4657-9_3
- G. R. Doddington, Speaker recognition Identifying people by their voices, Proc. IEEE, vol. 73, no. 11, pp. 16511664, 1985.DOI: 10.1109/PROC.1985.13345
- H. Gish, M.-H. Siu, and J. R. Rohlicek, Segregation of speakers for speech recognition and speaker identification., in icassp, 1991, vol. 91, pp. 873876. DOI: 10.1109/ICASSP.1991.150477
- H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, J. Acoust. Soc. Am., vol. 87, no. 4, pp. 17381752, 1990. DOI: 0.1121/1.399423.
- H. Rahali, Z. Hajaiej, and N. Ellouze, Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis, Int. J. Electron. Commun. Eng., vol. 8, no. 9, pp. 15861591, 2014. DOI.org/10.5281/zenodo.1095933
- J. Bradbury, Linear predictive coding, Mc G. Hill, 2000.
- J. E. W. Koh et al., Diagnosis of retinal health in digital fundus images using continuous wavelet transform (CWT) and entropies, Comput. Biol. Med., vol. 84, pp. 8997, 2017. DOI: 10.1016/j.compbiomed.2017.03.008
- J. Markel, B. Oshika, and A. Gray, Long-term feature averaging for speaker recognition, IEEE Trans. Acoust., vol. 25, no. 4, pp. 330337,August 1977 . DOI: 0.1109/TASSP.1977.1162961
- J. W. Cooley and J. W. Tukey, An algorithm for the machine computation of complex Fourier series, vol. 19, Math. Comput., p. 73, 1965.
- K. A. Kamiski and A. P. Dobrowolski, Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features, Sensors, vol. 22, no. 23, p. 9370, 2022.DOI: 10.3390/s22239370
- K. P. Li, J. E. Dammann, and W. D. Chapman, Experimental studies in speaker verification, using an adaptive system, J. Acoust. Soc. Am., vol. 40, no. 5, pp. 966978, 1966. DOI: 10.1121/1.1910221
- K. T. Al-Sarayreh, R. E. Al-Qutaish, and B. M. Al-Kasasbeh, Using the sound recognition techniques to reduce the electricity consumption in highways, J. Am. Sci., vol. 5, no. 2, pp. 112, 2009. DOI:10.7537/marsjas050209.01
- K.-P. Li and G. W. Hughes, Talker differences as they appear in correlation matrices of continuous speech spectra, J. Acoust. Soc. Am., vol. 55, no. 4, pp. 833837, 1974. DOI: 10.1121/1.1914608
- L. Eljawad et al., Arabic voice recognition using fuzzy logic and neural network, Eljawad, L., Aljamaeen, R., Alsmadi, MK, Al-Marashdeh, I., Abouelmagd, H., Alsmadi, S., Haddad, F., Alkhasawneh, RA, Alzughoul, M. & Alazzam, MB, pp. 651662, 2019. DOI: org/10.1063/5.0094741
- L. G. Kersta, Voiceprint identification, J. Acoust. Soc. Am., vol. 34, no. 5, p. 725, 1962.
- L. Gbadamosi, Voice Recognition System Using Template Matching, Int. J. Res. Comput. Sci., vol. 3, no. 5, p. 13, 2013. DOI: 10.7815/ijorcs. 35.2013.070.
- L. Gong, S. Xie, Y. Zhang, Y. Xiong, X. Wang, and J. Li, A Robust Feature Extraction Method for Sound Signals Based on Gabor and MFCC, in 2022 6th International Conference on Communication and Information Systems (ICCIS), 2022, pp. 4955. DOI: 10.1109/ICCIS56375.2022.9998146
- L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, vol. 77, no. 2, pp. 257286, 1989. DOI: 10.1109/5.18626
- L. Zhu and Q. Yang, Speaker Recognition System Based on weighted feature parameter, Phys. Procedia, vol. 25, pp. 15151522, 2012.DOI: 10.1016/j.phpro.2012.03.270
- M. A. Al-yoonus and S. A. Al-Kazzaz, FPGA-SoC Based Object Tracking Algorithms: A Literature Review, Al-Rafidain Eng. J., vol. 28, no. 2, pp. 284295, 2023. DOI:10.33899/rengj.2023.138936.1243.
- M. A. Al-Zakarya and Y. F. Al-Irhaim, Unsupervised and Semi-Supervised Speech Recognition System: A Review, AL-Rafidain J. Comput. Sci. Math., vol. 17, no. 1, pp. 3442, 2023. DOI:10.33899/csmj.2023.179466
- M. A. Islam, Y. Xu, T. Monk, S. Afshar, and A. van Schaik, Noise-robust text-dependent speaker identification using cochlear models, J. Acoust. Soc. Am., vol. 151, no. 1, pp. 500516, 2022.DOI: 10.1121/10.0009314
- M. Alsulaiman, A. Mahmood, and G. Muhammad, Speaker recognition based on Arabic phonemes, Speech Commun., vol. 86, DOI:10.1016/j.specom.2016.11.004.
- M. Jin and C. D. Yoo, Speaker verification and identification, in Behavioral Biometrics for Human Identification: Intelligent Applications, IGI Global, 2010, pp. 264289.DOI: 10.4018/978-1-60566-725-6.ch013
- M. Kotti, E. Benetos, and C. Kotropoulos, Computationally efficient and robust BIC-based speaker segmentation, IEEE Trans. Audio. Speech. Lang. Processing, vol. 16, no. 5, pp. 920933,2008. DOI:10.1109/TASL.2008.925152
- M. M. El Choubassi, H. E. El Khoury, C. E. J. Alagha, J. A. Skaf, and M. A. Al-Alaoui, Arabic speech recognition using recurrent neural networks, in Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No. 03EX795), 2003, pp. 543547. DOI: 10.1109/ISSPIT.2003.1341178
- M. McLaren, L. Ferrer, and A. Lawson, Exploring the role of phonetic bottleneck features for speaker and language recognition, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 55755579. DOI: 10.1109/ICASSP.2016.7472744.
- M. McLaren, R. Vogt, B. Baker, and S. Sridharan, A comparison of session variability compensation techniques for SVM-based speaker recognition, in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007), 2007, pp. 790793.
- M. McLaren, Y. Lei, and L. Ferrer, Advances in deep neural network approaches to speaker recognition, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, pp. 48144818. DOI: 10.1109/ICASSP.2015.7178885
- M. Wolf, Channel selection and reverberation-robust automatic speech recognition, PhD Thesis, Dept. of Signal Theory and Communication, Universitat Polit`ecnica de Catalunya, Barcelona, Spain, 2013. DOI: 10.5821/dissertation-2117-95257
- M.-C. Cheung, M.-W. Mak, and S.-Y. Kung, A two-level fusion approach to multimodal biometric verification, in Proceedings. (ICASSP05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2005, vol. 5, pp. v--485. DOI: 10.1109/ICASSP.2005.1416346.
- M.-H. Siu, G. Yu, and H. Gish, An unsupervised, sequential learning algorithm for the segmentation of speech waveforms with multiple speakers, in Acoustics, Speech, and Signal Processing, IEEE International Conference on, 1992, vol. 2, pp. 189192.DOI Bookmark: 10.1109/ICASSP.1992.226088
- N. Dehak, P. Dumouchel, and P. Kenny, Modeling prosodic features with joint factor analysis for speaker verification, IEEE Trans. Audio. Speech. Lang. Processing, vol. 15, no. 7, pp. 20952103, 2007.DOI: 10.1109/TASL.2007.902758
- N. K. Kaphungkui and A. B. Kandali, Text Dependent Speaker Recognition with Back Propagation Neural Network, Int. J. Eng. Adv. Technol., vol. 8, no. 5, pp. 14311434, 2019. ISBN:0-7803-9313-9
- N. Kanthi, Speaker Identification based on GFCC using GMM, Int. J. Innov. Res. Adv. Eng., vol. 1, no. 8, pp. 224232, 2014. ISSN: 2277 128X.
- N. M. Almarshady, A. A. Alashban, and Y. A. Alotaibi, Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset, Appl. Sci., vol. 13, no. 17, p. 9567, 2023. DOI: 10.3390/app13179567
- N. Najkar, F. Razzazi, and H. Sameti, A novel approach to HMM-based speech recognition systems using particle swarm optimization, Math. Comput. Model., vol. 52, no. 1112, pp. 19101920, 2010. DOI: 10.1016/j.mcm.2010.03.041
- N. Singh, Voice biometric: revolution in field of security, CSI Commun., vol. 43, no. 8, pp. 2425, 2019. ISs N0970-647X
- N. Singh, A. Agrawal, and R. Khan, The development of speaker recognition technology, IJARET., Vol. 9, Issue: 3, pp. 816, May June 2018.
- N. Singh, N. Bhendawade, and H. A. Patil, Novel cochlear filter based cepstral coefficients for classification of unvoiced fricatives, Int. J. Nat. Lang. Comput, vol. 3, no. 4, pp. 2140, 2014. DOI : 10.5121/ijnlc.2014.3402
- P. D. Bricker et al., Statistical techniques for talker identification, Bell Syst. Tech. J., vol. 50, no. 4, pp. 14271454, 1971. DOI:10.1002/j.1538-7305.1971.tb02561.x
- P. Gimeno, D. Ribas, A. Ortega, A. Miguel, and E. Lleida, Unsupervised adaptation of deep speech activity detection models to unseen domains, Appl. Sci., vol. 12, no. 4, p. 1832, 2022. DOI:10.3390/app12041832
- P. K. Kurzekar, R. R. Deshmukh, V. B. Waghmare, and P. P. Shrishrimal, A comparative study of feature extraction techniques for speech recognition system, Int. J. Innov. Res. Sci. Eng. Technol., vol. 3, no. 12, pp. 1800618016, 2014.
- P. K. Sharma et al., Eminent method of voice identification by applying pitch, intensity and pulse, in AIP Conference Proceedings, 2022, vol. 2393, no. 1. DOI: 10.1063/5.0074174
- P. Krl, Discrete Wavelet Transform for automatic speaker recognition, in 2010 3rd International Congress on Image and Signal Processing, 2010, vol. 7, pp. 35143518. DOI: 10.1109/CISP.2010.5646691
- P. Krishnamoorthy, H. S. Jayanna, and S. R. M. Prasanna, Speaker recognition under limited data condition by noise addition, Expert Syst. Appl., vol. 38, no. 10, pp. 1348713490, 2011.DOI: 10.1016/j.eswa.2011.04.069
- Q. Li and Y. Huang, An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions, IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 6, pp. 17911801, 2010. DOI: 10.1109/TASL.2010.2101594
- Q.-B. Hong, C.-H. Wu, H.-M. Wang, and C.-L. Huang, Statistics pooling time delay neural network based on x-vector for speaker verification, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 68496853.DOI: 10.1109/ICASSP40776.2020.9054350
- R. C. Rose and D. A. Reynolds, Text independent speaker identification using automatic acoustic segmentation, in International Conference on Acoustics, Speech, and Signal Processing, 1990, pp. 293296. DOI: 10.1109/ICASSP.1990.115638
- R. Fraud and F. Clrot, A methodology to explain neural network classification, Neural networks, vol. 15, no. 2, pp. 237246, 2002. DOI:10.1016/S0893-6080(01)00127-7
- R. M. Hanifa, K. Isa, and S. Mohamad, A review on speaker recognition: Technology and challenges, Comput. & Electr. Eng., vol. 90, p. 107005, 2021.DOI: 10.1016/j.compeleceng.2021.107005
- R. Potter, G. Kopp, and H. Green, Technical Aspects of Visual Speech, Bell Labs, New York, 1947.
- R. W. Schafer and L. R. Rabiner, Digital Representation of Speech, Invit. Pap. Proc. IEEE, vol. 63, p. 4, 1975.DOI: 10.1109/PROC.1975.9799
- S. Bhardwaj, S. Srivastava, M. Hanmandlu, and J. R. P. Gupta, GFM-based methods for speaker identification, IEEE Trans. Cybern., vol. 43, no. 3, pp. 10471058, 2013. DOI: 10.1109/TSMCB.2012.2223461
- S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., vol. 27, no. 2, pp. 113120, 1979. DOI: 10.1109/TASSP.1979.1163209
- S. Furui, 50 years of progress in speech and speaker recognition research, ECTI Trans. Comput. Inf. Technol., vol. 1, no. 2, pp. 6474, 2005. DOI:/10.37936/ecti-cit.200512.51834
- S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, Applications of support vector machine (SVM) learning in cancer genomics, Cancer genomics & proteomics, vol. 15, no. 1, pp. 4151, 2018. DOI:10.21873/cgp.20063
- S. J. Kuntz and J. B. Rawlings, Maximum likelihood estimation of linear disturbance models for offset-free model predictive control, in 2022 American Control Conference (ACC), 2022, pp. 39613966. DOI: 10.23919/ACC53348.2022.9867344
- S. M. Omer, J. A. Qadir, and Z. K. Abdul, Uttered Kurdish digit recognition system, J. Univ. Raparin, vol. 6, no. 2, pp. 7885, 2019.DOI: 10.26750/paper
- S. Mallat, A wavelet tour of signal processing. Elsevier, 1999. Academic Press 84 Theobald Road, London WCIX 8RR, UK. ISBN-13: 978-0-12-466606-1.
- S. Mizuta and K. Nakajima, A discriminative training method for continuous mixture density HMMs and its implementation to recognize noisy speech, J. Acoust. Soc. Japan, vol. 13, no. 6, pp. 389393, 1992. DOI: 10.1250/ast.13.389
- S. Nakagawa and H. Suzuki, A new speech recognition method based on VQ-distortion measure and HMM, in 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, vol. 2, pp. 676679. DOI: 10.1109/ICASSP.1993.319401
- S. Pruzansky and M. V Mathews, Talker-recognition procedure based on analysis of variance, J. Acoust. Soc. Am., vol. 36, no. 11, pp. 20412047, 1964. DOI: 10.1121/1.1919320
- S. Sujiya and E. Chandra, A review on speaker recognition, Int J Eng Technol, vol. 9, no. 3, pp. 15921598, 2017, DOI:10.21817/ijet/2017/v9i3/170903513.
- S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, Cbam: Convolutional block attention module, in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 319. DOI:/10.1007/978-3-030-01234-2_1
- S. Yadav and A. Rai, Frequency and temporal convolutional attention for text-independent speaker recognition, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020,
- DOI: 10.1109/ICASSP40776.9054440.
- T. F. Zheng and L. Li, "Robustness-related issues in speaker recognition," vol. 2. Springer, 2017.DOI:10.1007/978-981-10-3238-7
- T. THAT, Automatic speaker recognition using Gaussian mixture speaker models, Lincoln Lab. J., vol. 8, no. 2, 1995. Doi: 10.1121/1.2027823
- U. Kumaran, S. Radha Rammohan, S. M. Nagarajan, and A. Prathik, Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN, Int. J. Speech Technol., vol. 24, pp. 303314, 2021. DOI: 10.1007/s10772-020-09792-x
- V. Vapnik, Estimation of dependences based on empirical data. Springer Science & Business Media, 2006. DOI: 10.1007/978-0-387-34239-9
- V. Vestman, T. Kinnunen, R. G. Hautamki, and M. Sahidullah, Voice mimicry attacks assisted by automatic speaker verification, Comput. Speech & Lang., vol. 59, pp. 3654, 2020.DOI: 10.1016/j.csl.2019.05.005
- W. Endress, W. Bambach, and G. Flosser, Voice spectrograms as a function of age, Voice Disguise Voice Imitation, JASA, vol. 49, no. 6, p. 2, 1971. DOI: 10.1121/1.1912589
- W. Jiang, P. Liu, and F. Wen, Speech magnitude spectrum reconstruction from MFCCs using deep neural network, Chinese J. Electron., vol. 27, no. 2, pp. 393398, 2018. DOI: 10.1049/cje.2017.09.018
- W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, Support vector machines for speaker and language recognition, Comput. Speech & Lang., vol. 20, no. 23, pp. 210229, 2006. DOI: 10.1016/j.csl.2005.06.003
- W. M. Campbell, J. R. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek, High-level speaker verification with support vector machines, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 1, pp. 1-73.DOI: 10.1109/ICASSP.2004.1325925
- X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals, Speaker diarization: A review of recent research, IEEE Trans. Audio. Speech. Lang. Processing, vol. 20, no. 2, pp. 356370, 2012. DOI: 10.1109/TASL.2011.2125954
- X. Qin, H. Bu, and M. Li, Hi-mia: A far-field text-dependent speaker verification database and the baselines, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 76097613.DOI: 10.1109/ICASSP40776.2020.9054423
- X. Qin, N. Li, C. Weng, D. Su, and M. Li, Simple attention module based speaker verification with iterative noisy label detection, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 67226726.DOI: 10.1109/ICASSP43922.2022.9746294
- X. Xie, X. Liu, H. Chen, and H. Wang, Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 15.DOI: 10.1109/ICASSP49357.2023.10095083
- Y. Liu, Y. Qian, N. Chen, T. Fu, Y. Zhang, and K. Yu, Deep feature for text-dependent speaker verification, Speech Commun., vol. 73, pp. 113, 2015. DOI: 10.1016/j.specom.2015.07.003
- Y. Lukic, C. Vogt, O. Drr, and T. Stadelmann, Speaker identification and clustering using convolutional neural networks, in 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), 2016, pp. 16. DOI: 10.1109/MLSP.2016.7738816
- Y. Zhang and L. Liu, Multi-task learning for X-vector based speaker recognition, Int. J. Speech Technol., pp. 17, 2023.DOI: 10.1007/s10772-023-10058-5