CONTINUOUS AND DISCRETE DECODING OF OVERT SPEECH WITH ELECTROENCEPHALOGRAPHY

Date

2023-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Neurological disorders affecting speech production impair the quality of life for over 7 million individuals in the US. Traditional speech interfaces like eye-tracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech Brain-Computer Interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences displayed on a screen selected for phonetic similarity to the English language. Pre-processing techniques, including filtering, line noise removal, and eye artifact removal, were applied prior to assessment of methods for the removal of facial electromyography (EMG) contamination. Four Blind Source Separation cleaning methods, including Canonical Correlation Analysis, Independent Component Analysis, and two-stage EMG removal methods employing Ensemble Empirical Mode Decomposition, were evaluated. Three implementations of these methods were selected for further decoding analysis based on signal characteristic metrics and their correlation with decoding performance. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with/without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes. These models were employed for discrete and continuous speech decoding tasks, achieving above-chance participant-independent decoding performance for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, and gamma) for decoding performance, and a perturbation analysis identified crucial channels. Assessed channel selection methods did not significantly improve performance, but they still outperformed chance levels, suggesting high-density EEG systems might not be warranted for speech BCIs. Transfer learning demonstrated the possibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants. The successful classification of continuously-produced phonemes and regression of acoustic characteristics signify progress in non-invasive speech BCI development. This research presents promising steps towards developing a universal non-invasive speech BCI control signal, offering opportunities for commercial applications as a replacement for task-specific protocols. Improved speech BCIs hold the potential to improve the overall quality of life of individuals living with neurological speech disorders.

Description

Keywords

neuroengineering, speech synthesis, EEG, EMG, deep learning

Citation