Development of automated speech recognition and language learning tools

Researchers at the University of Oxford, led by Professor Aditi Lahiri, Director of the Language and Brain Laboratory, have developed an automated speech recognition system based on modelling of how the brain processes sounds. The research has been awarded four European Research Council grants: two Advanced and, for innovation building on their results, two Proof of Concept awards.

Illustrating thought processingHow does the brain process sounds?

Credit: Shutterstock

The FlexSR (Flexible Speech Recognition System) that has been built from scratch by Professor Lahiri and her team, does not require the extensive training that speech recognition systems usually do, and can be easily adapted to different speakers, dialects, and languages

Professor Lahiri has held two ERC Advanced Grants in addition to two ERC Proof of Concept grants for work in this area, to carry out the fundamental investigation of variations in speech, for example at the boundaries between words. The first led to a linguistic model of speech based on phonological features, the articulatory and acoustic properties of each sound that form its contrasts with others.  For example, the ‘voicing’ feature (whether the vocal cords are vibrating or not) forms a component of the contrast between the ‘p’ and ‘b’ consonant sounds in English.

Building on this work using Proof of Concept funding, the team developed the speech recognition system that was trained to recognise a universal set of 19 such features and can combine them to identify speech sounds, or phones. Importantly, it aims to target those features that are essential to human understanding of speech, and ignores or tolerates those that can vary across speakers or utterances.

The research team has also used this model to develop a prototype language learning app. Using the FlexSR technology, the app analyses words and sentences spoken into the app by the user, and then provides detailed feedback. Used in this way, language learners can receive personalised responses to improve their pronunciation.  

The FlexSR technology could also be used in many other ways. For example, in healthcare settings to help patients who are re-learning how to speak, or in any number of other environments that rely on voice-activated technology. Two patents have been filed covering key aspects of this technology.

Read more: Innovating in the humanities: Building a groundbreaking speech recognition system

Funder: European Research Council