Removing bias from healthcare AI tools

22 February 2024

Rapid advances in Artificial Intelligence (AI) have opened the way for the creation of a huge range of new healthcare tools, but to ensure that these tools do not exacerbate preexisting health inequities, researchers urge the use of more representative data in their development.

Researchers from Oxford University’s Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), University College London and the Centre for Ethnic Health Research, supported by Health Data Research UK, have for the first time studied the full detail of ethnicity data in the NHS. They outline the importance of using representative data in healthcare provision and have compiled this information into a research-ready database.

The new study, published in Nature Scientific Data, is the first part of a three-phase project that aims to reduce bias in AI health prediction models which are trained on real-world patient data. The project, which addresses ethnicity disparities that were highlighted during the pandemic, is part of the UK Government’s COVID-19 Data and Connectivity National Core Study led by Health Data Research UK.

The researchers used de-identified data on ethnicity and other characteristics from general practice and hospital health records, accessed safely within NHS England’s Secure Data Environment (SDE) service, via the British Heart Foundation Data Science Centre’s CVD-COVID-UK/COVID-IMPACT Consortium. This is the first time that patient ethnicity data has been studied at this depth and breadth for the whole population of England. The researchers were able to combine records to analyse patient self-identified ethnicity recorded through over 489 potential codes.

Researchers analysed how more than 61 million people in England identified their ethnicity in over 250 different groups. They also looked at the characteristics of those with no record of their ethnicity, and how conflicts in patient ethnicity data can arise. The data, now available for other researchers to use, shows that 1/10 patients lack ethnicity records, and around 12% of patients had conflicting ethnicity codes in their patient records.

Sara Khalid, Associate Professor of Health Informatics and Biomedical Data Science at NDORMS, explained: ‘Health inequity was highlighted during the COVID19 pandemic, where individuals from ethnically diverse backgrounds were disproportionately affected, but the issue is long-standing and multi-faceted.

‘Because AI-based healthcare technology depends on the data that is fed into it, a lack of representative data can lead to biased models that ultimately produce incorrect health assessments. Better data from real-world settings, such as the data we have collected, can lead to better technology and ultimately better health for all.’

Professor Cathie Sudlow, Chief Scientist at Health Data Research UK and Director of its BHF Data Science Centre said: ‘We are delighted to be supporting hundreds of researchers to harness the power of the UK’s rich health data. This study on ethnicity recording highlights how different sources of health data from the whole English population can be accessed and analysed in a safe and secure way, providing insights that are relevant to everyone. The findings will empower health professionals, patients, carers and policy makers to make better decisions that will benefit people of all ages, ethnic groups, and social backgrounds across the country.’

The study assessed the available detail of ethnicity data in NHS England, including across different types of ethnicity codes. For example, NHS hospitals record patient data via 19 ethnicity codes, while GPs use the globally recognised SNOMED-CT Codes, of which there are 489. However, health researchers lose the finer detail from these recording systems as they typically collapse these groups into just 5 or 6, potentially leading to less accurate research.

The researchers plan to demonstrate the value of these findings in the subsequent phases of the project, which will first focus on using these detailed results on ethnicity data to better describe how different ethnicities were impacted by the COVID-19 pandemic, and then feed into more equitable artificial intelligence and machine learning tools suitable for use by diverse patient groups.

Notes to editors

The full paper " Towards mitigating health inequity via machine learning: a nationwide cohort study to develop and validate ethnicity-specific models for prediction of cardiovascular disease risk in COVID-19 patients" is published in Nature Scientific Data.

This research was enabled, supported and funded by the British Heart Foundation Data Science Centre, Health Data Research UK, the Alan Turing Institute and UKRI.

For further information or interviews with the article author, please contact:
Chris McIntyre, Media Relations Manager, University of Oxford
tel (direct): 01865 270 046
tel (News Office): 01865 280528
[email protected]

About the University of Oxford
Oxford University has been placed number 1 in the Times Higher Education World University Rankings for the eighth year running, and number 3 in the QS World Rankings 2024. At the heart of this success are the twin-pillars of our ground-breaking research and innovation and our distinctive educational offer.
Oxford is world-famous for research and teaching excellence and home to some of the most talented people from across the globe. Our work helps the lives of millions, solving real-world problems through a huge network of partnerships and collaborations. The breadth and interdisciplinary nature of our research alongside our personalised approach to teaching sparks imaginative and inventive insights and solutions.
Through its research commercialisation arm, Oxford University Innovation, Oxford is the highest university patent filer in the UK and is ranked first in the UK for university spinouts, having created more than 300 new companies since 1988. Over a third of these companies have been created in the last five years. The university is a catalyst for prosperity in Oxfordshire and the United Kingdom, contributing £15.7 billion to the UK economy in 2018/19, and supports more than 28,000 full time jobs.

About the British Heart Foundation Data Science Centre:

  • The British Heart Foundation Data Science Centre is a partnership between Health Data Research UK (HDR UK) and the British Heart Foundation (BHF). We work closely with patients, the public, NHS organisations, researchers, and clinicians to promote the safe and ethical use of data for research into the causes, prevention and treatment of all diseases of the heart and circulation.
  • Our vision is to improve the public’s cardiovascular health through the power of large-scale data and advanced analytics across the UK. Funded by the BHF and embedded within HDR UK, the centre provides the leadership, co-ordination and engagement needed to deliver this vision, through building capability, capacity and infrastructure to drive excellence in data-enabled cardiovascular research.
  • To find out more about the BHF Data Science Centre, visit

About Health Data Research UK
Health Data Research UK is the national institute for health data with a mission to unite the UK’s health data to enable discoveries that improve people’s lives. It is a charity funded by UK Research and Innovation, the Department of Health and Social Care in England and equivalents in Northern Ireland, Wales and Scotland, and leading medical research charities.
E: [email protected]