Prof Aditi Lahiri and her team have won a 2026 ERC Proof of Concept Award for their FLEX-CODESWITCH project
All current successful automatic speech recognition systems are largely designed for English and trained on many thousands of hours of speech recordings with their transcription that are invariably based on monolingual speech. However, multilingual communities often switch between two or more languages (code-switching). Recognition rates of code-switching speech are seriously impaired. Code-switching across more than one language can easily violate standard Large Language Models, which constitutes a serious setback for recognition. Recently, end-to-end systems have been used to model multilingual speech including code-switching. However, their performance is impeded by insufficient amounts of good quality code-switching speech and text training data. Errors in these transcriptions can therefore have dangerous impacts when used in AI analytics pathways.
John Cairns
Professor Aditi Lahiri and her team at Oxford have successfully built an innovative single-word recognition system called FlexSR that is based on linguistic principles (phonological features). It can be adapted across Germanic languages and different accents by altering the lexical representation of words without the need for additional training. It is an accurate, fast, and computationally lightweight system for word recognition in Dutch, English, and German. The FlexSR system, including the entire word corpora, is smaller than 10MB which makes it very versatile and can be run locally on device without needing internet access.
The extraction of phonological features makes FlexSR a powerful tool to identify words across languages, and the Oxford team will use additional linguistic information from its current ERC Synergy project PAAL to further develop the current FlexSR. They will extend their mispronunciation detection-based transcription verification system and build a short-phrase code-switching command/query system. This system is initially being targeted for use by the banking and multimedia sectors but has great potential to be applied more extensively. During this 12-month project the focus will be on Bengali, Hindi, and English with code-switching and accent variation across the three.