MorSR: Integrating morpho-phonology in speech recognition
In March 2019 we started work on a new ERC Proof of Concept project called MorSR to examine how it might be possible to use morpho-phonological principles in a speech recognition system to improve performance.
Automatic Speech Recognition (ASR) is considered to represent the most natural of possible human-computer interaction (HCI) technologies. Current commercial ASR systems rely on significant training to identify the key properties of the acoustic signals which represent words in a given language, resulting in major challenges in the deployment of ASR systems in many areas where they could have substantial social impact. Our objective is to translate research results from our existing ERC funded project MORPHON into a novel ASR system to remove such barriers.
We have previously demonstrated that the use of a universally-applicable set of phonological features delivers an isolated word recognition system (FlexSR) with enhanced phoneme recognition accuracy. It is more robust to inputs containing non-standard speech, less common word sequences, or dialect variations, and can be easily adapted to new languages.
This flexibility is problematic for current ASR systems which rely on the probabilistic sequencing of whole words using a language model (LM) trained on large written text corpora. Obtaining sufficient training data for a new LM to cover each new language can be prohibitively expensive. Instead, MorSR will incorporate linguistic information about word-structure to recognise sequences of words accurately. This approach will significantly reduce the search space, and increase the probability of correctly identifying words.
A major outcome of this project will be an innovative LM based on linguistic principles. Unlike existing approaches, this model will incorporate crucial regularities that are present in spoken language but invisible to LMs trained on text corpora. By combining this approach with (FlexSR's) key strengths in identifying subtle phonological contrasts, MorSR will not only enable improved predictions of word sequences in running speech, but will also dramatically reduce the requirement for training data when adapting the system to a new language.
MorSR's strengths include:
▪ the prediction of likely word sequences based on grammatical principles;
▪ lower requirements for training data than existing approaches;
▪ fast adaptation to new languages;
▪ ASR that is fast, accurate and secure.
Watch this space for further updates as we develop our system demonstrator!