University of Oxford researchers create largest ever human family tree

24 February 2022

Researchers from the University of Oxford’s Big Data Institute have taken a major step towards mapping the entirety of genetic relationships among humans: a single genealogy that traces the ancestry of all of us. The study has been published today in Science.

  • New genealogical network of human genetic diversity reveals how individuals across the world are related to each other, in unprecedented detail
  • The research predicts common ancestors, including approximately when and where they lived
  • The analysis recovers key events in human evolutionary history, including the migration out of Africa
  • The underlying method could have widespread applications in medical research, for instance identifying genetic predictors of disease risk

The past two decades have seen extraordinary advancements in human genetic research, generating genomic data for hundreds of thousands of individuals, including from thousands of prehistoric people. This raises the exciting possibility of tracing the origins of human genetic diversity to produce a complete map of how individuals across the world are related to each other.

Until now, the main challenges to this vision were working out a way to combine genome sequences from many different databases and developing algorithms to handle data of this size. However, a new method published today by researchers from the University of Oxford’s Big Data Institute can easily combine data from multiple sources and scale to accommodate millions of genome sequences.

Dr Yan Wong, an evolutionary geneticist at the Big Data Institute, and one of the principal authors, explained: ‘We have basically built a huge family tree, a genealogy for all of humanity that models as exactly as we can the history that generated all the genetic variation we find in humans today. This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome.’

Since individual genomic regions are only inherited from one parent, either the mother or the father, the ancestry of each point on the genome can be thought of as a tree. The set of trees, known as a “tree sequence” or "ancestral recombination graph”, links genetic regions back through time to ancestors where the genetic variation first appeared.

Lead author Dr Anthony Wilder Wohns, who undertook the research as part of his PhD at the Big Data Institute and is now a postdoctoral researcher at the Broad Institute of MIT and Harvard, said: ‘Essentially, we are reconstructing the genomes of our ancestors and using them to form a vast network of relationships. We can then estimate when and where these ancestors lived. The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples.’

The study integrated data on modern and ancient human genomes from eight different databases and included a total of 3,609 individual genome sequences from 215 populations. The ancient genomes included samples found across the world with ages ranging from 1,000s to over 100,000 years. The algorithms predicted where common ancestors must be present in the evolutionary trees to explain the patterns of genetic variation. The resulting network contained almost 27 million ancestors.

After adding location data on these sample genomes, the authors used the network to estimate where the predicted common ancestors had lived. The results successfully recaptured key events in human evolutionary history, including the migration out of Africa.

Although the genealogical map is already an extremely rich resource, the research team plans to make it even more comprehensive by continuing to incorporate genetic data as it becomes available. Because tree sequences store data in a highly efficient way, the dataset could easily accommodate millions of additional genomes.

Dr Wong said: ‘This study is laying the groundwork for the next generation of DNA sequencing. As the quality of genome sequences from modern and ancient DNA samples improves, the trees will become even more accurate and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today.’

Dr Wohns added: ‘While humans are the focus of this study, the method is valid for most living things; from orangutans to bacteria. It could be particularly beneficial in medical genetics, in separating out true associations between genetic regions and diseases from spurious connections arising from our shared ancestral history.’

Notes for editors:

For further information or for interview requests, please contact Dr Caroline Wood, Oxford Population Health, University of Oxford: [email protected]

The study is published in Science: www.science.org/doi/10.1126/science.abi8264. The Science press package team at 202-326-6777 or [email protected]. The paper can also be downloaded directly from EurekAlert for those registered with this service.

A video simulation showing the human migration out of Africa based on the new genealogical network can be viewed here: https://vimeo.com/678821780. This link is password-protected until the embargo lifts. The password is “unifiedgenealogy” . This material may be freely used by reporters as part of news coverage, with proper attribution, but it must not be modified or altered. Please contact Dr Caroline Wood [email protected] if you wish to do this. Please cite the owner of the material when publishing: Wohns et al. Science (2022).

The study was a collaboration between the Big Data Institute, University of Oxford; the Broad Institute of MIT and Harvard, USA; Harvard University, USA and University of Vienna, Austria.

About the Big Data Institute
The Big Data Institute is located in the Li Ka Shing Centre for Health Informatics and Discovery at the University of Oxford. It is an interdisciplinary research centre that focuses on the analysis of large, complex data sets for research into the causes, consequences, prevention and treatment of disease. Research is conducted in areas such as genomics, population health, infectious disease surveillance and the development of new analytic methods. The Big Data Institute is supported by funding from the Medical Research Council, the Engineering, Physical Sciences Research Council, the UK Research Partnership Investment Fund, the National Institute for Health Research Oxford Biomedical Research Centre, Wellcome and philanthropic donations from the Li Ka Shing and Robertson Foundations. Further details are available at www.bdi.ox.ac.uk

About Oxford University
Oxford University has been placed number one in the Times Higher Education World University Rankings for the sixth year running, and number two in the QS World Rankings 2022. At the heart of this success are the twin-pillars of our ground-breaking research and innovation and our distinctive educational offer.
Oxford is world-famous for research and teaching excellence and home to some of the most talented people from across the globe. Our work helps the lives of millions, solving real-world problems through a huge network of partnerships and collaborations. The breadth and interdisciplinary nature of our research alongside our personalised approach to teaching sparks imaginative and inventive insights and solutions.
Through its research commercialisation arm, Oxford University Innovation, Oxford is the highest university patent filer in the UK and is ranked first in the UK for university spinouts, having created more than 200 new companies since 1988. Over a third of these companies have been created in the past three years. The University is a catalyst for prosperity in Oxfordshire and the United Kingdom, contributing £15.7 billion to the UK economy in 2018/19, and supports more than 28,000 full-time jobs.