Citation: | LI Ping, GAO Qingyuan, XIA Yu, ZHANG Xiaoyong, CAO Yi. Voiceprint recognition method based on SE-DR-Res2Block[J]. Chinese Journal of Engineering, 2023, 45(11): 1962-1969. doi: 10.13374/j.issn2095-9389.2022.09.19.001 |
[1] |
鄭方, 李藍天, 張慧等. 聲紋識別技術及其應用現狀. 信息安全研究, 2016, 2(1):44
Zheng F, Li L T, Zhang H, et al. Overview of Voiceprint Recognition Technology and Applications. J Inf Secur Res, 2016, 2(1): 44.
|
[2] |
Hayashi V T, Ruggiero W V. Hands-free authentication for virtual assistants with trusted IoT device and machine learning. Sensors, 2022, 22(4): 1325 doi: 10.3390/s22041325
|
[3] |
Faundez-Zanuy M, Lucena-Molina J J, Hagmueller M. Speech watermarking: An approach for the forensic analysis of digital telephonic recordings[J/OL]. arXiv preprint (2022-03-12) [2022-09-19]. https://arxiv.org/abs/2203.02275
|
[4] |
Garain A, Ray B, Giampaolo F, et al. GRaNN: Feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals. Neural Comput Appl, 2022, 34(17): 14463 doi: 10.1007/s00521-022-07261-x
|
[5] |
Waghmare K, Gawali B. Speaker recognition for forensic application: A review. J Pos Sch Psychol, 2022, 6(3): 984
|
[6] |
Mittal A, Dua M. Automatic speaker verification systems and spoof detection techniques: review and analysis. Int J Speech Technol, 2022, 25: 105 doi: 10.1007/s10772-021-09876-2
|
[7] |
Burget L, Matejka P, Schwarz P, et al. Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Trans Audio Speech Lang Process, 2007, 15(7): 1979 doi: 10.1109/TASL.2007.902499
|
[8] |
鮑煥軍, 鄭方. GMM-UBM和SVM說話人辨認系統及融合的分析. 清華大學學報(自然科學版), 2008(S1):693
Bao H J, Zheng F. Combined GMM-UBM and SVM speaker identification system. J Tsinghua Univ Sci Technol, 2008(S1): 693
|
[9] |
Kenny P, Stafylakis T, Ouellet P, et al. JFA-based front ends for speaker recognition // 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, 2014: 1705
|
[10] |
Cumani S, Plchot O, Laface P. On the use of i–vector posterior distributions in probabilistic linear discriminant analysis. IEEE/ACM Trans Audio Speech Lang Process, 2014, 22(4): 846 doi: 10.1109/TASLP.2014.2308473
|
[11] |
Variani E, Lei X, McDermott E, et al. Deep neural networks for small footprint text-dependent speaker verification // 2014 IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, 2014: 4052
|
[12] |
Snyder D, Ghahremani P, Povey D, et al. Deep neural network-based speaker embeddings for end-to-end speaker verification // 2016 IEEE Spoken Language Technology Workshop. San Diego, 2016: 165
|
[13] |
Peddinti V, Povey D, Khudanpur S, et al. A time delay neural network architecture for efficient modeling of long tem-poral contexts // Sixteenth Annual Conference of the International Speech Communication Association. Dresden, 2015: 3214
|
[14] |
Okabe K, Koshinaka T, Shinoda K. Attentive statistics pooling for deep speaker embedding // Interspeech. Hyderabad, 2018: 2252
|
[15] |
Jiang Y H, Song Y, McLoughhlin I, et al. An effective deep embedding learning architecture for speaker verification // Interspeech. Graz, 2019: 4040
|
[16] |
Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 2017: 4700
|
[17] |
Zhou J F, Jiang T, Li Z, et al. Deep speaker embedding extraction with channel-wise feature responses and additive supervision softmax loss function // Interspeech. Graz, 2019: 2883
|
[18] |
Li Z, Zhao M, Li L, et al. Multi-feature learning with canonical correlation analysis constraint for text-independent speaker verification // 2021 IEEE Spoken Language Technology Workshop. Shenzhen, 2021: 330
|
[19] |
Desplanques B, Thienpondt J, Demuynck K. Ecapa-tdnn: emphasized channel attention, propagation and aggregation in tdnn based speaker verification // Interspeech. Shanghai, 2020: 3830
|
[20] |
Gao S H, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture. IEEE Trans Pattern Anal and Mach Intell, 2019, 43(2): 652
|
[21] |
Hu J, Shen L, Sun G. Squeeze-and-excitation networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, 2018: 7132
|
[22] |
Nagrani A, Chung J S, Zisserman A. Voxceleb: a large-scale speaker identification dataset[J/OL]. arXiv preprint (2018-05-30) [2022-09-19]. https://arxiv.org/abs/1706.08612
|
[23] |
McLaren M, Ferrer L, Castan D, et al. The speakers in the wild (SITW) speaker recognition database // Interspeech. San Francisco, 2016: 818
|
[24] |
郭振超, 楊震, 葛子瑞, 等. 一種基于語音圖信號處理的端點檢測方法. 信號處理, 2022, 38(04):788 doi: 10.16798/j.issn.1003-0530.2022.04.013
Guo Z C, Yang Z, Ge Z R, et al. An endpoint detection method based on speech graph signal processing. J Signal Process, 2022, 38(4): 788 doi: 10.16798/j.issn.1003-0530.2022.04.013
|
[25] |
鄭艷, 姜源祥. 基于特征融合的說話人聚類算法. 東北大學學報(自然科學版), 2021, 42(7):952
Zheng Y, Jiang Y X. Speaker clustering algorithm based on feature fusion. J Northeast Univ Nat Sci, 2021, 42(7): 952
|
[26] |
Deng J K, Guo J, Xue N N, et al. Arcface: Additive angular margin loss for deep face recognition // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 2019: 4690
|
[27] |
陳志高, 李鵬, 肖潤秋, 等. 文本無關說話人識別的一種多尺度特征提取方法. 電子與信息學報, 2021, 43(11):3266 doi: 10.11999/JEIT200917
Chen Z G, Li P, Xiao R Q, et al. A multiscale feature extraction method for text-independent speaker recognition. J Electron Inf Technol, 2021, 43(11): 3266 doi: 10.11999/JEIT200917
|