Novel learning-based multi-modal methods for human voice apparatus imaging

Rushdi Zahid Rusho

doi:10.25820/etd.007602

Back

Novel learning-based multi-modal methods for human voice apparatus imaging

Dissertation

Open access

Novel learning-based multi-modal methods for human voice apparatus imaging

Rushdi Zahid Rusho

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Autumn 2024

DOI: 10.25820/etd.007602

Files and links (1)

pdf

Novel_Learning_Based_Multi_Modal_Methods_for_Human_Voice_Apparatus_Imaging_Final_revision12.21 MBDownload View

Free to read and download, Open Access

Abstract

Human speech production is a complex process that involves generating vibrations at vocal folds, and modulating breath through vocal tract shaping to produce meaningful sounds primarily for communication and social interaction. Vocal apparatus refers to the human vocal tract anatomy responsible for sound generation, and consists of several structures such as larynx, glottis, pharynx, soft/hard palate, tongue, lips, and nose. Disorders in any of these components can compromise one’s ability to produce effective language and communication. Visualizing the complex movements of vocal apparatus is crucial to advance our understanding of speech production mechanism, diagnose speech disorders, optimize speech therapy, aid in planning surgical procedures on speech organs, and improve various technologies related to speech synthesis and recognition. However, current imaging modalities possess several technical limitations. X-ray Computed Tomography (CT) can clearly detect air-tissue boundaries, and bony structures, but it faces limitations due to radiation exposure. Ultrasound can safely image at high temporal resolution, but cannot capture deeper structures (e.g., glottis), and has poor soft-tissue contrast. Electromagnetic articulography (EMA) and Electropalatography (EPG) work as kinematic tracking systems that measure movement and position of articulators in real time, but it is invasive. Optical endoscopy uses a flexible optical fiber with a camera to visualize nose and larynx, but it invasively deforms the anatomy. Magnetic Resonance Imaging (MRI) is emerging as a powerful modality for dynamic vocal apparatus imaging during speech due to its excellent soft-tissue contrast, non-ionizing radiation, and capability to image along any arbitrary plane orientations, but challenges remain for its widespread adaption. For example, due to device physics, MR signals are sensitive to off-resonance at air and tissue boundaries, are susceptible to motion blurring and lose spatial and temporal features while trying to model fast arbitrary speech motion. There is, therefore, a critical need for an imaging modality that can safely and comprehensively visualize vocal tract shaping during speech production with high spatial and temporal fidelity. This thesis develops novel vocal apparatus MRI techniques for speech and voice production tasks at 3 Tesla. Specifically, this thesis has four components related to imaging various aspects of the vocal apparatus: (1) Development and evaluation of a novel motion-robust, analysis manifold learning-based MRI reconstruction method for multi-slice 2D dynamic speech imaging with improved spatio-temporal resolution at 3 Tesla; (2) Development of a motion-robust, variational manifold learning-based MRI reconstruction method for time-aligned 2D multi-slice (aka pseudo-3D) dynamic speech imaging which enables extraction of vocal tract functions that quantify 3D kinematics of vocal tract shaping; (3) Feasibility study and development of novel motion-robust, constrained reconstruction based dynamic laryngeal MRI reconstruction method to visualization of gross changes in glottic configuration during speech and breathing; (4) Development of a 3D framework to combine bony structures from ultra-low dose CT and soft-tissue structures from MRI to create a 3D high resolution hybrid CT-MRI model of vocal tract. The effectiveness of our methods was validated by several experimental data, blind image quality analysis by experts, and in-vivo experiments in a range of speech science applications.

Magnetic Resonance Imaging

Dynamic MRI

Image Reconstruction

Image Registration

Speech Imaging

Details

Title: Subtitle: Novel learning-based multi-modal methods for human voice apparatus imaging
Creators: Rushdi Zahid Rusho
Contributors: Sajan Goud Lingala (Advisor)
Sean B. Fain (Committee Member)
Mathews Jacob (Committee Member)
David Meyer (Committee Member)
Sarah C. Vigmostad (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Biomedical Engineering
Date degree season: Autumn 2024
DOI: 10.25820/etd.007602
Publisher: University of Iowa
Number of pages: xx, 103 pages
Grant note: This research was conducted using an MRI instrument funded by the NIH under grant 1S10OD025025-01. Additionally, this research was partially supported by the National Institutes of Health Predoctoral Training Grant T32 HL 144461, with PIs Eric A. Hoffman and Joseph M. Reinhardt; the University of Iowa Radiology Pilot Grant; and the University of Iowa OVPR Jump Start Award.
Language: English
Date submitted: 09/12/2024
Description illustrations: illustrations, graphs
Description bibliographic: Includes bibliographical references (pages 86-103).
Public Abstract (ETD): Human speech production is a complex process that involves generating vibrations at vocal folds, and modulating breath through vocal tract shaping to produce meaningful sounds primarily for communication and social interaction. Vocal apparatus refers to the human vocal tract anatomy responsible for sound generation, and consists of several structures such as vocal cords, tongue, lips, and nose. Disorders in any of these components can compromise one’s ability to produce effective language and communication. Visualizing the complex movements of vocal apparatus is crucial to advance our understanding of speech production mechanism, diagnose speech disorders, optimize speech therapy, aid in planning surgical procedures on speech organs, and improve various technologies related to speech synthesis and recognition. However, current imaging modalities possess several technical limitations. X-ray Computed Tomography (CT) can clearly detect air-tissue boundaries, and bony structures, but it faces limitations due to radiation exposure. Ultrasound can safely image rapidly moving organs, but cannot capture deeper structures, and has poor image quality for soft tissues. Electromagnetic articulography (EMA) and Electropalatography (EPG) track movement and position of articulators in real time, but it is invasive and do not provide anatomical views. Optical endoscopy uses a flexible optical fiber with a camera to visualize nose and larynx, but it invasively deforms the anatomy. Magnetic Resonance Imaging (MRI) is emerging as a powerful modality for dynamic vocal apparatus imaging during speech due to its excellent soft-tissue image quality, no radiation exposure, and capability to image along any arbitrary plane orientations, but challenges remain for its widespread adaption. For example, due to physical working principles of the device, MRI image quality deteriorates at air and tissue boundaries, and MRI becomes vulnerable to faithfully capture vocal apparatus motion while trying to reconstruct fast arbitrary speech vi and breathing tasks. There is, therefore, a critical need for an imaging modality that can safely and comprehensively visualize vocal tract shaping during speech production with high spatial and temporal fidelity. This thesis develops novel vocal apparatus MRI techniques for speech and voice production tasks at scanners with 3 Tesla field strength. Specifically, this thesis has four components related to imaging various aspects of the vocal apparatus: (1) Development and evaluation of a novel MRI method for speech imaging with improved image quality and motion capture capabilities; (2) Development of a novel technique using MRI that provides a three-dimensional view of vocal tract shaping during speech; (3) Feasibility study and development of novel MRI methods for imaging of larynx to visualize overall changes in vocal folds configuration during speech and breathing; (4) Development of a framework to combine bony structures from ultra-low dose (very low radiation exposure) CT and soft-tissue structures from MRI to create a three-dimensional high image quality hybrid CT-MRI model of vocal tract. The effectiveness of our methods was validated by several experimental data, blind image quality analysis by experts, and in-vivo experiments in a range of speech science applications.
Academic Unit: Roy J. Carver Department of Biomedical Engineering
Record Identifier: 9984774766902771

Metrics

5 Record Views