FlexSR introduces a Flexible new approach to automatic Speech Recognition. It is founded on a linguistic model of speech based on phonological features, the articulatory and acoustic properties of each sound that form its contrasts with others.
For example, the ‘voicing’ feature (whether the vocal cords are vibrating or not) forms a component of the contrast between the ‘p’ and ‘b’ consonant sounds in English.
Under this original ERC “Proof of Concept” grant, we developed a model that is trained to recognise a universal set of 19 such phonological features and can combine them to identify speech sounds, or phones. Importantly, it aims to target those features that are essential to human understanding of speech, and ignores or tolerates those that can vary across speakers or utterances.
This approach results in an Automatic Speech Recognition (ASR) system that requires relatively little training, can be adapted flexibly to new languages (even those where training data in the form of annotated speech recordings may be sparse), and employs theoretical insights from the study of how the human brain processes speech to emulate this process on a computer.
Based on these principles, a mobile phone application (app) for language learning was created as a demonstrator enabling second language learners to improve their pronunciation. Words and sentences spoken into the app are analysed, and detailed feedback is given rather like a personal tutor might do to improve and correct mistakes.
Our research team have made the following informal video to demonstrate how one such practical implementation of FlexSR technology might work:
This novel approach to speech recognition has enabled significant progress to be made rapidly, using comparatively limited resources, in terms of both the development effort and computing power required.
Two patents have been filed covering key aspects of this technology and protecting any commercial implementation of the idea.