Understanding the genome 101

To understand the relationship between the genome and disease, the structure of the genome, variations between the genomes of individuals, and the features of an individual's genome must be understood ('genotype'). The individual genome then needs to be related to the physical attributes of the individual ('phenotype'). For the human genome, a phenotype may be a personal trait such as height or eye colour, or a disease manifestation such as cystic fibrosis or a raised risk of Alzheimer's disease. It might also relate to a pathogenic bacterium's toxicity or a plant's drought resistance.

Genetic variations between humans are key to differences in phenotype.  Variation between individuals' genomes is the subject of substantial genomic research. Specific types of variation include:

  • Single Nucleotide Polymorphisms (SNPs): these typically occur at thousands of sites across a 3Gb genome, but at each site, the sample genome differs from the reference genome(s) by a single nucleotide. 
  • Inversions: occur when a section of DNA is broken away from the chromosome and reincorporates 'back to front', reversing that section
  • Insertions: the addition of one or more bases
  • Deletions: where some genetic material is missing, whether a single base or more. Insertions and Deletions may be co-located.
  • Translocations: where a section of DNA appears in a different position than would be expected from the relevant reference genome(s)
  • Copy Number Variations (CNVs): where a length of DNA is repeated a different number of times compared to the reference genome(s)


Since the publication of the draft human genome, substantial advances have been made in the understanding of the structure and function of genomes.  Many research projects are conducted across multiple institutions in many different countries. Examples of early and current large projects include:

 

Human Genome Project

The Human Genome Project was a 13-year project that resulted in the first complete map of a human genome. The project involved leading academic centres throughout the world, including the UK's Wellcome Trust and the US National Institutes of Health (NIH).

In 2001, the draft genome sequence was published in Nature, and in 2004, the completed analysis of the sequence was published. The new sequence identified almost all known genes (99.74%), and defined 22,287 'gene loci'. Previously it had been believed that there were as many as 100,000 genes. The finished genome now works as a template for researchers conducting analyses of the genome.


International HapMap Project
The International HapMap Project was started in 2002. Its goal was to identify and catalogue genetic similarities and differences in human beings, by comparing the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared.

Using the publicly available information in the HapMap, researchers are able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States.

In June 2007, phase 2 of the HapMap project was reported in Nature. Three million more single nucleotide polymorphisms (SNPs) have been identified, representing between 25-33% of all human SNPs with a frequency of more than 5%.

In 2007, the journal Science named Human Genetic Variation as Breakthrough of the Year. The journal commented that improvements in DNA sequencing technology will enable even deeper analysis of genetic variation:

"New technologies that are slashing the costs of sequencing and genome analyses will make possible the simultaneous genome-wide search for SNPs and other DNA alterations in individuals. Already, the unexpected variation within one individual's published genome has revealed that we have yet to fully comprehend the degree to which our DNA differs from one person to the next."


Encyclopedia of DNA Elements (ENCODE) study
The Encyclopedia of DNA Elements (ENCODE) study, published in June 2007, represented a major advance in understanding of the human genome. It found that the areas of the genome that were not genes, previously sometimes referred to as 'junk DNA', are critical to the regulation and control of DNA processes. This confirmed the idea that the workings of DNA were more complex than originally thought, and highlighted the need for more detailed research into the genome.


Large-scale Genome Sequencing programme: Medical Sequencing projects
The US National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), funds several sequencing projects. Key projects include:


Wellcome Trust Case Control Consortium
Phase 1 of the Wellcome Trust Case Control Consortium (WTCCC) analysed DNA samples from 17,000 people and reviewed 8 major diseases. In April 2008, phase 2 of the WTCCC project was announced to analyse the DNA of 120,000 people, and in January 2009 the WTCCC phase 3 was funded for 4 diseases and 30,000 samples.  These studies use microarray technologies rather than sequencing technologies to identify genetic variations associated with diseases. These data are a foundation for further 'deep sequencing' studies.


International Cancer Genome Consortium
In April 2008, a new consortium was announced that will gain high quality data on the genomes of at least 50 different cancers. The consortium includes researchers from ten countries including the UK's WEllcome Trust Sanger Institute. Each of the projects will use specimens from roughly 500 patients. The project comprises the Cancer Genome Project and Cancer Genome Atlas. 
 
The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the US National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), which are both part of the National Institutes of Health, U.S. Department of Health and Human Services.   The project recently completed a three-disease pilot project.

"1000 genomes project" 
In 2008, the 1000 Genomes Project was announced. The international research consortium includes the UK's Sanger Institute, the Beijing Genomics Institute, and the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH). This project aims to sequencing the genomes of at least a thousand people from around the world, to "create the most detailed and medically useful picture to date of human genetic variation." 

In 2011, the complete sequence of a whole genome no longer guarantees a publication in a leading journal such as Nature or Science, as the number of projects underway has expanded so dramatically.  Countless research projects that examine multiple whole genomes are underway, more complex genomes (such as certain polyploid plant genomes) are being tackled by international consortia, and whole human genome data is starting to be introduced into clinical practice in some specialist centres.

Two major online resources for information about the genome and genomic research are:
The leading UK charity, The Wellcome Trust
The US National Human Genome Research Institute, part of the National Institutes of Health.