SUPERVOICE: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Hanqing Guo; Qiben Yan; Nikolay Ivanov; Ying Zhu; Li Xiao; Eric J. Hunter

doi:10.1145/3488932.3517420

Back

Conference proceeding

SUPERVOICE: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Hanqing Guo, Qiben Yan, Nikolay Ivanov, Ying Zhu, Li Xiao and Eric J. Hunter

ASIA CCS'22: PROCEEDINGS OF THE 2022 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, pp.1019-1033

01/01/2022

DOI: 10.1145/3488932.3517420

View Online

Abstract

Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. However, they often have high error rates and/or long delays. In this paper, we explore a new direction of human voice research by scrutinizing the unique characteristics of human speech at the ultrasound frequency band. Our research indicates that the high-frequency ultrasound components (e.g. speech fricatives) from 20 to 48 kHz can significantly enhance the security and accuracy of speaker verification. We propose a speaker verification system, SuperVoice that uses a two-stream DNN architecture with a feature fusion mechanism to generate distinctive speaker models. To test the system, we create a speech dataset with 12 hours of audio (8,950 voice samples) from 127 participants. In addition, we create a second spoofed voice dataset to evaluate its security. In order to balance between controlled recordings and real-world applications, the audio recordings are collected from two quiet rooms by 8 different recording devices, including 7 smartphones and an ultrasound microphone. Our evaluation shows that SuperVoice achieves 0.58% equal error rate in the speaker verification task, which reduces the best equal error rate of the existing systems by 86.1%. SuperVoice only takes 120 ms for testing an incoming utterance, outperforming all existing speaker verification systems. Moreover, within 91 ms processing time, SuperVoice achieves 0% equal error rate in detecting replay attacks launched by 5 different loudspeakers. Finally, we demonstrate that SuperVoice can be used in retail smartphones by integrating an off-the-shelf ultrasound microphone.

Computer Science

Mathematics

Physical Sciences

Technology

Telecommunications

Computer Science, Information Systems

Computer Science, Theory & Methods

Mathematics, Applied

Science & Technology

Details

Title: Subtitle: SUPERVOICE: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
Creators: Hanqing Guo - Michigan State University
Qiben Yan - Michigan State University
Nikolay Ivanov - Michigan State University
Ying Zhu - Michigan State University
Li Xiao - Michigan State University
Eric J. Hunter - Michigan State University
Resource Type: Conference proceeding
Publication Details: ASIA CCS'22: PROCEEDINGS OF THE 2022 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, pp.1019-1033
Publisher: Assoc Computing Machinery
DOI: 10.1145/3488932.3517420
Number of pages: 15
Grant note: CNS-1950171; CCF-2007159 / National Science Foundation; National Science Foundation (NSF) National Institutes of Health; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA R01DC012315 / National Institute on Deafness and Other Communication Disorders; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Institute on Deafness & Other Communication Disorders (NIDCD)
Language: English
Date published: 01/01/2022
Academic Unit: Communication Sciences and Disorders
Record Identifier: 9984446450302771

Metrics

4 Record Views

1 Times Cited - Web of Science