ARTIC: An Adaptive Real-Time Imprecise Computation Pipeline for Audio Analysis
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
One of the more complex issues facing natural language processing (NLP) is how to deal with overlapped speech, i.e., when two or more speakers interfere with or talk over each other, and the more general case of co-channel speech, i.e., when two or more speakers are present in an audio stream regardless of interference. Frequently, one speaker is selected as the primary speaker for the purpose of analysis with other speakers relegated to the category of interfering speakers. Despite the breadth of research into overlapped speech detection, few endeavors have been made into preserving the speech of so-called interfering speakers. A compelling case can be made for a more comprehensive analysis of co-channel speech in the fields of computational linguistics, accessibility automation, and entertainment, particularly under real-time constraints. Currently available open-source audio libraries, while technically capable of supporting such research endeavors, are cumbersome to work with. To this end, this work introduces the Adaptive Real-Time Imprecise Computation (ARTIC) pipeline for audio analysis, a simple but flexible approach to stream processing that tracks computation times and deadlines for the various pipeline stages and affords the user the ability to specify automatic precision reductions to avoid projected deadline misses as well as automatic precision increases to combat underutilization. A proof of concept for an overlapped speech detector is tested with the intent to build upon this groundwork for a more comprehensive project having the eventual goal of speaker separation. Although the classifier’s accuracy needs improvement, preliminary results using quadratic band spacing and dominant formant estimation demonstrate a resilience to information loss caused by precision reduction under deadline pressure.