Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion

Yunxin Zhao; Mili Kuruvilla-Dugdale; Minguang Song

doi:10.1109/TASLP.2018.2860682

Back

Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion

Journal article

Peer reviewed

Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion

Yunxin Zhao, Mili Kuruvilla-Dugdale and Minguang Song

IEEE/ACM transactions on audio, speech, and language processing, Vol.26(12), pp.2267-2276

12/01/2018

DOI: 10.1109/TASLP.2018.2860682

PMCID: PMC6980218

PMID: 31984214

Files and links (1)

url

https://www.ncbi.nlm.nih.gov/pmc/articles/6980218View

Open Access

Abstract

We investigate a structured sparse spectral transform method for voice conversion (VC) to perform frequency warping and spectral shaping simultaneously on high-dimensional (D) STRAIGHT spectra. Learning a large transform matrix for high-D data often results in an overfit matrix with low sparsity, which leads to muffled speech in VC. We address this problem by using the frequency-warping characteristic of a source-target speaker pair to define a region of support (ROS) in a transform matrix, and further optimize it by nonnegative matrix factorization (NMF) to obtain structured sparse transform. We also investigate structural measures of spectral and temporal covariance and variance at different scales for assessing VC speech quality. Our experiments on ARCTIC dataset of 12 speaker pairs show that embedding the ROS in spectral transforms offers flexibility in tradeoffs between spectral distortion and structure preservation, and the structural measures provide quantitatively reasonable results on converted speech. Our subjective listening tests show that the proposed VC method achieves a mean opinion score of "very good" relative to natural speech, and in comparison with three other VC methods, it is the most preferred one in naturalness and in voice similarity to target speakers.

Acoustics

Engineering

Engineering, Electrical & Electronic

Science & Technology

Technology

Details

Title: Subtitle: Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion
Creators: Yunxin Zhao - University of Missouri
Mili Kuruvilla-Dugdale - University of Missouri
Minguang Song - University of Missouri
Resource Type: Journal article
Publication Details: IEEE/ACM transactions on audio, speech, and language processing, Vol.26(12), pp.2267-2276
DOI: 10.1109/TASLP.2018.2860682
PMID: 31984214
PMCID: PMC6980218
NLM abbreviation: IEEE/ACM Trans Audio Speech Lang Process
ISSN: 2329-9290
eISSN: 2329-9304
Publisher: IEEE
Number of pages: 10
Grant note: R15DC016383 / NATIONAL INSTITUTE ON DEAFNESS AND OTHER COMMUNICATION DISORDERS; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Institute on Deafness & Other Communication Disorders (NIDCD) R15 DC016383 / National Institutes of Health; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA
Language: English
Date published: 12/01/2018
Academic Unit: Communication Sciences and Disorders
Record Identifier: 9984446542202771

Metrics

53 Record Views

5 Times Cited - Web of Science