Understanding the genome

To understand the relationship between the genome and disease, the structure of the genome, and variation between the genomes of individuals must be understood ('genotype'). The individual genome then needs to be related to the physical attributes of the individual ('phenotype'). A phenotype may be a personal trait such as height or eye colour, or a disease manifestation such as cystic fibrosis or a high risk of Alzheimer's disease.

Genetic variations between humans are the key to differences in phenotype, and the structural variation of the genome is the subject of much genomic research. Specific types of variation include


In recent years, dramatic advances have been made in the understanding of the genome and its relationship with human health. An outline of the major projects that have marked this journey is shown below.

The Human Genome Project was a 13-year project that resulted in the first complete map of a human genome. The project involved leading academic centers throughout the world, including the UK's Wellcome Trust and the US National Institutes of Health (NIH).

In 2001, the draft genome sequence was published in Nature, and in 2004, the completed analysis of the sequence was published. The new sequence identified almost all known genes (99.74%), and defined 22,287 'gene loci'. Previously it had been believed that there were as many as 100,000 genes. The finished genome now works as a template for researchers conducting analyses of the genome.


The International HapMap Project was started in 2002. Its goal was to identify and catalogue genetic similarities and differences in human beings, by comparing the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared.

Using the publicly available information in the HapMap, researchers are able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States.

In June 2007, phase 2 of the HapMap project was reported in Nature. Three million more single nucleotide polymorphisms (SNPs) have been identified, representing between 25-33% of all human SNPs with a frequency of more than 5%.

In 2007, the journal Science named Human Genetic Variation as Breakthrough of the Year. The journal commented that improvements in DNA sequencing technology will enable even deeper analysis of genetic variation:

"New technologies that are slashing the costs of sequencing and genome analyses will make possible the simultaneous genome-wide search for SNPs and other DNA alterations in individuals. Already, the unexpected variation within one individual's published genome has revealed that we have yet to fully comprehend the degree to which our DNA differs from one person to the next."


 The Encyclopedia of DNA Elements (ENCODE) study , published in June 2007, represented a major advance in understanding of the human genome. It found that the areas of the genome that were not genes, often referred to as 'junk DNA', are critical to the regulation and control of DNA processes. This confirmed the idea that the workings of DNA were more complex than originally thought, and highlighted the need for more detailed research into the genome.


 Cancer Genome Atlas Project
The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the US National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), which are both part of the National Institutes of Health, U.S. Department of Health and Human Services.


Large-scale Genome Sequencing programme: Medical Sequencing projects
The US National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), runs several sequencing projects. Key projects include:

Tumor Sequencing Project
The Tumor Sequencing Project (TSP) Consortium is a collaboration among participants at the Baylor College of Medicine Human Genome Sequencing Center, the Broad Institute Genome Sequencing Platform, the Dana Farber Cancer Institute, the Memorial Sloan-Kettering Cancer Center, the Genome Sequencing Center and Siteman Cancer Center at Washington University, the M.D. Anderson Cancer Center and the University of Michigan Medical Center. The TSP will pilot approaches to large-scale identification of genomic changes in tumors and aims to sequence the exonic regions of 1,000 genes in almost 200 specimens of adenocarcinoma of the lung, as well as use high density SNP genotyping arrays for high resolution identification of changes in chromosomal copy number.

Assessment of Sequencing Technologies for Analyzing Tumor DNA
A collaboration among investigators at the J. Craig Venter Institute (JCVI) and The Johns Hopkins University has been established to assess different technologies for sequencing tumor DNA. This project will analyze the DNA sequence of 37 genes in a collection of 20 glioblastoma tumors.


The Personal Genome Project was announced in 2007, spearheaded by George Church, Professor of Genetics at Harvard Medical School. The project's stated goal is "to encourage the development of personal genomics technology and practices that are effective, informative, and responsible, yield identifiable and improvable benefits at manageable levels of risk, and are broadly available at modest cost". The project aims ultimately to sequence the genomes of 100,000 people


In 2008, the "1000 genomes project" was announced. The international research consortium includes the UK's Wellcome Trust Sanger Institute, the Beijing Genomics Institute, and the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH). This project aims to sequencing the genomes of at least a thousand people from around the world, to "create the most detailed and medically useful picture to date of human genetic variation."


Wellcome Trust Case Control Consortium
Phase 1 of the Wellcome Trust Case Control Consortium (WTCCC) analysed DNA samples from 17,000 people and reviewed 8 major diseases. In April 2008, phase 2 of the WTCCC project was announced. This will analyse the DNA of 120,000 people, and has been hailed "the largest ever study of the genetics behind common diseases." 60 institutions worldwide will be involved in the project.


International Cancer Genome Consortium In April 2008, a new consortium was announced that will gain high quality data on the genomes of at least 50 different cancers. The consortium includes researchers from ten countries including the UK's Sanger Centre. Each of the projects will use specimens from roughly 500 patients and is expected to cost around $20 million.


Two major online resources for information about the genome and genomic research are:
The leading UK Charity, The Wellcome Trust
The US National Human Genome Research Institute, part of the National Institutes of Health.