Unravelling the genome

of 9

The secrets of the genome

The genome of an organism is all of its DNA – for eukaryotic cells, that means the nuclear DNA and the DNA of any mitochondria or chloroplasts. Understanding the human genome can be seen as the key to understanding the mystery of human life itself. The technology needed to analyse a genome was only developed in 1977 by the British scientist Frederick Sanger. From then onwards, our ability to read the DNA which defines an organism has got better, faster and cheaper all the time.

Timeline of genome sequencing

This is a timeline of genome sequencing. For a more interactive and visual timeline, scroll down to the bottom of the page.

1977 The genome of a single stranded RNA bacteriophage MS2 (a virus which attacks bacteria) is sequenced by Walter Fiers and his team in Belgium and the genome of a single stranded DNA bacteriophage Phi X 174 is sequenced by Frederick Sanger and his team in Cambridge, UK.
1995 The bacterium Haemophilus influenzae is the first free-living organism in the world to have its entire genome sequenced by Frederick Sanger and his team at Cambridge.
1996 The first fungal genome is sequenced - that of Saccharomyces cerevisiae, or baker’s yeast. It was sequenced by an international collaboration for yeast genome sequencing.
1998 The first animal genome is sequenced – that of a soil-living, 1mm long nematode worm, Caenorhabditis elegans which is a widely-used laboratory organism. The work was a collaboration between The Genome Institute at Washington University USA and the Wellcome Trust Sanger Institute, Cambridge UK.
2000 The first plant genome, Arabidopsis thaliana, is sequenced by teams from the USA, France and Japan.

The first insect genome is sequenced by teams in the USA and Europe. The insect used is the fruit fly Drosophila melanogaster, widely used in genetics experiments.

The first rough draft of the human genome is announced.
2001 The first full draft of the human genome sequence is released for the first time, produced by many teams of scientists from around the world in the Human Genome Project and Celera Genomics.
2002 Genomes are sequenced thick and fast – in this year alone scientists published the genomes of the mouse Mus musculus (widely used as a model organism in research), the mosquito Anopheles gambiae, which carries malaria, the Japanese pufferfish (Takifugu rubripes), and Oryza sativa or rice, the staple food of millions of people around the world.
2003 Human Genome Project officially completed with the final draft of the first sequencing published.
2004 The genome of red jungle fowl, Gallus gallus, is sequenced. It is a close relative of domestic hens and the first egg-laying animal to be sequenced.
2005 The genome of the chimpanzee, Pan troglodyte, is sequenced – the first non-human primate genome to be revealed.

The first draft of the International HapMap, comparing key regions of DNA between populations, was produced.
2008 The 1000 Genomes Project was set up to find most genetic variants with frequencies of at least 1% in the populations studied. It used the genomes of 1000 people. The success of the project depended on new, faster cheaper technologies for DNA sequencing.
2010 The genome of Xenopus tropicalis, the Western clawed frog, is sequenced. These frogs are often used in the study of early development.

The Wellcome Trust launched the UK10K project, which aimed to analyse and compare the genomes of 4000 healthy people in the UK with 6000 people living with diseases which appear to have a genetic cause or link.
2011 Next-generation DNA sequencing makes it possible to sequence a human genome in less than a week for under $2000.
2012 ENCODE (the Encyclopaedia of DNA Elements) was published, identifying all of the functional parts of the genome.

The 100,000 Genomes Project was set up, aiming to analyse the complete genomes of 100,000 NHS patients and their families by the end of 2017. The project will focus on patients with rare genetic diseases, their families, and patients with cancer.
2013 The genome of the zebrafish, Danio rerio, was sequenced. These fish are widely used as model organisms in research.
2015 The 1000 Genomes Project was completed.
2017 Results are emerging from the 100,000 Genomes Project which continues to recruit.

The first diagnostic test for an infectious disease based on genome sequencing is developed – for tuberculosis.

A major research initiative developed between the UK Biobank, GlaxoSmithKline (GSK) in the UK and the Regeneron Genetics Centre (RGC) in the US is set up to generate genetic sequence data on the 500,000 volunteers who have provided anonymous health and well-being data, as well as genetic material, to the UK Biobank over the last 10 years. Ultimately this data will be made widely available for the development of new drugs, diagnostics and treatments.

Human Genome Project

In 1990 the Human Genome Project set out to identify (map) all of the genes in the human chromosomes (the project suggested around 25,000 of them coded for specific proteins) and to sequence the 3 billion base pairs which make up the human DNA. The project was set up in 1990 and involved scientists working in 18 different countries. This international involvement is very important – knowledge of the human genome should belong to everyone.

Who provided the DNA? No-one really knows - a group of volunteers gave samples and some of these were then chosen anonymously and at random for all the scientists to work with.

The original Human Genome Project also had specific aims about the storage and analysis of all the data involved, and consideration of the ethical, legal and social issues which are inevitably raised when such personal genetic information is unravelled was also part of the brief. The project cost around 2.7 billion US dollars, and showed that every individual has at least 99.9% of their DNA in common! One surprising discovery was that less than 2% of the genome is actually involved in coding for specific proteins. The rest of the DNA – known as the non-coding DNA – has a huge range of functions, from regulating which proteins are made and when they are made to how they are packaged in the cell. However, there is still a great deal to find out about the non-coding DNA in the genome and how it affects us.

When the Human Genome Project was first launched, scientists worked on mapping the individual genes, and then as technology developed they moved on to sequencing the DNA in detail. It was thought it would take 15 years to complete the project, but technology moved on so fast that the genome was sequenced in 2003, two years ahead of schedule. Two automated processes which were developed had a major impact on the success of the project. These were the polymerase chain reaction and DNA sequencing.

Since the Human Genome Project was completed, there have been a number of other projects aiming to find out more about the human genome across the globe, investigating both healthy individuals and the part played by the genome in a wide range of diseases.

The 1000 Genomes Project

The 1000 Genomes Project, launched in 2008 and completed in 2012, was a similar international effort to the original Human Genome Project. It involved scientists analysing whole genomes and gene regions from many different samples (actually around 2,500 people but resulting in 1,000 complete genomes!). The project has resulted in a catalogue of human genetic variants, and it identifies the regions of the genome most commonly associated with different types of disease. The results of the study are freely available to scientists all over the world.

The UK10K Project

This UK-based project was established in 2010 by the Wellcome Trust and published the initial findings in 2015. The aim was to investigate the genomes of 10,000 people in the UK – that’s one person in every 6,000 of the population. The study looked at the genomes of 4,000 healthy people, and compared these to the exomes of 6,000 people with diseases suspected of having a genetic link, such as obesity, autism, certain forms of heart disease and schizophrenia. An exome is made up of all the exons in the genome. Exons are the regions of DNA which actually code for proteins. It produced some valuable results – and showed that for a real understanding of particular conditions, an even bigger sample was needed.

The 100,000 Genomes Project

In 2012, the UK Government set up Genomics England, aiming to provide a genomics service for the NHS. The first target is to analyse 100,000 genomes from two target groups. One is patients with rare genetic diseases and their families The aim is to identify rare genetic conditions, enabling families to understand the problem and to plan for future additions to their families, and to help scientists understand the role of particular genes in normal development and metabolism. The other group is patients with a range of different cancers, where both the genomes of the patients and their tumours will be studied. Scientists hope this will lead to the development of more accurate tests, able to pick up potentially life-threatening cancers earlier, when treatment is more successful. They also hope that by monitoring changes in the tumours, they can track the progress of the diseases and use the most effective treatments at each stage. In 2016 the project was able to offer diagnoses to families of some children affected by very rare genetic diseases – and the work is still in progress.

These are not the only studies into the human genome that have taken place, or are happening now. Scientists have discovered that the number of genes in the human genome actively involved in coding directly for proteins is considerably lower than the 25-30,000 initially identified by the Human Genome Project. In 2014 a metadata analysis published in Human Molecular Genetics suggested we may have as few as 19,000 coding genes!

The UK Biobank project

The UK Biobank project has collected data from 500,000 volunteers aged between 40 - 69 years old between the years of 2006 and 2010. Much of that data is traditional medical information – body mass, blood pressure measurements, blood, urine and saliva samples and lifestyle information. In 2017 GSK in the UK and RGC in the US started working together to analyse the genomes of the volunteers whose details are in the Biobank. This combination of medical data collected over a long period of time with the genomes of the people concerned has the potential to be one of our most powerful tools yet for understanding disease and developing new medicines.

There is still so much to learn...

Timeline maker

Interactive Resources for Schools

Content

Help and information

Topic last updated: 24 Nov 2021

Unravelling the genome

The secrets of the genome

Timeline of genome sequencing

Human Genome Project

The 1000 Genomes Project

The UK10K Project

The 100,000 Genomes Project

The UK Biobank project

Unravelling the genome

Sequencing the DNA