Genomics - California Lutheran University

Download Report

Transcript Genomics - California Lutheran University

Genomics

Biology 122 Genes and Development

Genomics milestones First genome: Haemophilus influenza, 1995; by Craig Venter and TIGR Human genome, draft sequences, 2001: Two groups (Francis Collins of the Public consortium ; Craig Venter and CELERA) Now: 1000’s of bacteria have been sequenced. Hundreds of human genomes have been sequenced!

NCBI, Nov. 2010

From Genome.gov Human genome conference 6/7/2010

Restriction analysis

FISH

a.

9 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

bcr Reciprocal translocation between one 9 and one #22

Fig. 18.2

extra long chromosome 9 (“der 9”) and the Philadelphia chromosome (Ph 1 ) containing the fused bcr-abl gene. This is a Ph 1 schematic view representing metaphase chromosomes.

22 abl der 9 bcr (on normal 22) abl (on normal 9) bcr abl Normal interphase nucleus fused gene Interphase nucleus of leukemic cell containing the Philadelphia chromosome (Ph 1 ) b.

b: Reprinted by permission from Macmillan Publishers Ltd:

Bone Marrow Transplantation

33, 247 249, “Secondary Philadelphia chromosome after non myeloablative peripheral blood stem cell transplantation for a myelodysplastic syndrome in transformation,” T Prebet, A-S Michallet, C Charrin, S Hayette, J P Magaud, A Thiébaut, M Michallet, F E Nicolini © 2004

Sequence-tagged sites (STS)

Comparison of genetic and physical maps

Manual sequencing

Automated DNA sequencing

Estimated genes in sequenced genomes

Transposable elements

Alternative splicing

Genome variation

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

SNPs SNP SNP SNP Chromosome 1 A A C A C G C C A Chromosome 2 A A C A C G C C A Chromosome 3 A A C A T G C C A a.

Chromosome 4 A A C A C G C C A Haplotypes T T C G G G G T C T T C G A G G T C T T C G G G G T C T T C G G G G T C A G T C G A C C G A G T C A A C C G A G T C A A C C G A G T C G A C C G Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 b.

Diagnostic SNPs C T C A A A G T A C G G T T C A G G C A T T G A T T G C G C A A C A G T A A T A C C C G A T C T G T G A T A C T G G T G T C G A T T C C G C G G T T C A G A C A A/G T/C Haplotype 1 Haplotype 2 Haplotype 3 Haplotype 4 A T C A C G G T C A C C C/G c.

Comparison of plant genomes (Comparative genomics)

Rice Genome Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Sugarcane

Fig. 18.9

1 2 3 4 5 6 7 8 9 10 11 12 Corn Chromosome Segments A B C D F G H I Wheat Chromosome Segments Rice Sugarcane Corn Wheat 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

SCIENTIFIC THINKING Hypothesis: Flowers and leaves will express some of the same genes.

Prediction: When mRNAs isolated from Arabidopsis flowers and from leaves are used as probes on an Arabidopsis genome microarray, the two different probe sets will hybridize to both common and unique sequences.

Genomes deposited at NCBI

Organism Prokaryotes Archaea Bacteria Eukaryotes Animals Mammals Birds Fishes Insects Flatworms Roundworms Amphibians Reptiles Other animals Plants Land plants Green Algae Fungi Ascomycetes Basidiomycetes Other fungi Protists Apicomplexans Kinetoplasts Other protists total: 4 1 889 2 10 5 7 4 3 16 13 1 Complete 850 78 773 39 6 3 2 1 Genome sequencing projects statistics Draft assembly 585 5 580 249 110 37 3 13 26 2 13 23 19 4 1 1 16 1 19 834 8 31 10 83 63 12 Revised: Nov 18, 2010 In progress 534 32 502 320 159 81 12 12 20 3 12 22 78 73 4 3 33 854 3 40 4 39 28 8 total 1969 115 1855 608 275 121 15 25 48 5 26 1 1 38 108 96 11 138 104 21 13 81 19 8 53 2577

GOLD (Genomes Online Database)

Bacterial Archaeal Eukaryotic

Complete

2666 149 166

Incomplete

5493 182 2037

Targeted

1960 1021 Finished Permanent draft Complete, not published Draft In progress DNA received Awaiting DNA Targeted (funded, not started)

Date NCBI, Genomes 11/23/2011

26 1529 3426 266 510 Viroids Viruses Bacterial Archaeal Eukaryotes Organelles

Date Species

41 2721

Reference sequences

41 3933 1681 121 1815 2974

11/23/2011

424 1 13 438

In progress

5140 90 Metagenome studies Metagenome samples 340 1930 [Metagenome are environmental samples]

Human Disease genes From Genome.gov, 11-2010

Animals

Amphipod Crustacean Aphid, Pea Beetle, Red Flour Bug (Chagas' Vector) Centipede, Geophilimorph Chelicerate (Horseshoe Crab) Drug Resistant Parasitic Nematode Freshwater Polyp Fruit Fly Honey Bee Louse, Body Mosquito Placazoan Planarian Roundworm Sand Fly Sea Slug Sea Squirt Sea Star Sea Urchin Snail, Freshwater Strongylid Nematode Tardigrade Wasp, Parasitoid Worm, Acorn Worm, Priapulid

Vertebrates

Chicken Coelacanth Gar, Spotted Hagfish Lamprey, Sea Lizard, Anole Pufferfish Shark, Elephant Skate Spotted African Lungfish Stickleback, Threespine Turtle, Painted Zebra finch

Genome.gov

11/22/2011

Animal genomes in progress, November 2011 (genome.gov)

Mammals

Aardvark Alpaca Armadillo, Nine-banded Baboon Bat, Little Brown (Microbat) Bat, Big brown Bonobo Bushbaby Bushbaby/Galago California leaf-nosed bat Cape golden mole Cat Chimpanzee Chinchilla Chinese hamster Cow Crested porcupine Degu Dog Dolphin Eastern grey kangaroo Elephant, African Savannah Ferret Fly Fox (Megabat) Giant anteater Gibbon Golden-mantled howling monkey Greater horseshoe bat Guinea Pig Hedgehog, European Hippopottamus Honey Possum (Noolbenger) Horse Human Hyrax Koala Lemur, Flying Lemur, Mouse Lesser Egyptian jerboa Lizard, Anole Llama Long-haired (Rufous) elephant shrew Macaque, Cynomolgous Macaque, Pigtail Macaque, Rhesus Macaque, Rhesus (Chinese population) Malayan tapir Mangabey, Sooty Marmoset Mexican free-tailed bat Mole Monkey, Squirrel Mouse Mouse, Deer Mouse, White-Footed Naked mole rat North American porcupine Opossum, Gray Short-Tailed Opossum, Laboratory Orangutan Pangolin Pika Platypus, Duck-Billed Rabbit Rat Rat, Kangaroo Rhesus Macaque Ring-tailed lemur Shrew, Elephant Shrew, European Common Shrew, Tree Sloth Springhare Squirrel Star nosed mole Stickleback, Threespine Syrian/Golden Hamster Tarsier Tenrec (Lesser Hedgehog) Vervet Vole, Prairie Wallaby, Tammar Water Chevrotain Weddell Seal West Indian manatee White rhinocerous Mammal genomes in progress, November 2011 (genome.gov)

Science Nov 17, 2006

Neanderthals

Neanderthals

• 99.5% identical to humans when comparing the same sequences

Neanderthals

Draft sequence published May 7, 2010.

Neanderthals from four sites (see map) 21 bones from Vindija analyzed for this study 3 bones were selected for detailed sequencing (from three individuals) Bones from three other sites were also sequenced (see map) Compared Neanderthal to five human genomes Conclusion: Non-African humans contain some Neanderthal derived sequences (1 to 4%) (gene flow estimated to be Neanderthal to Human, and occurred > 45,000 years ago) Notes: Humans and Neanderthals lived in the same area for > 10,000 years.

Neanderthals perished 30,000 years ago.

Neanderthals

Four models of how the gene transfer could have occurred (option 2 is least likely, option 3 most likely) Transfer most likely occurred in Middle-East/Western Asia PNG = Papua New Guinea

Denisovians

Third type of human genome sequenced Finger bone found in the Denisova cave in Altai Krai, Russia in 2008 The Denisova bone had a genome distinct from modern humans or Neanderthals The bone was dated to 41,000 years ago Since only bone fragments are known, it is not known how they looked It is thought that they were distributed throughout Asia and Melanasia Analysis of the genome, and comparison with humans and neanderthals, suggests that 4% of non-African DNA is related to neanderthals and 4 to 6% of melanasian genomes is related to denisovians. This suggests some interbreeding between the first modern humans, neanderthals, and denisovians.

Analysis of HLA types (immune proteins) suggests that over half of eurasian HLA types came from neanderthals or denisovians, suggesting that they were selected for in the eurasians.

Watson’s genome

• Sequenced using shotgun sequencing • About 3.5 percent of Watson’s genome could not be matched to the reference genome-probably due to differences in cloning step

Venter’s genome compared to the reference genome • 32 million reads resulted in 2.8 billion base pairs of assembled sequence (7.5 fold coverage) • 4.1 million differences to the already published genome (12.3 million bases different) • 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions.

How different are individuals?

• 44% of genes were heterozygous for one or more variants (they could determine both copies) • A conservative estimate that a minimum of 0.5% variation exists between two haploid genomes (all heterozygous bases).

How different are individuals?

• DNA from a Yoruba from Ibadan, Nigeria was completed.

• About 4 million SNPs were found, 74% had already been found by others.

• About 24% more polymorphism (heterozygosity) than Caucasian genomes.

• There were 5,704 indels ranging from 50 to over 35,000 bp long. Many were SINES and LINES.

Bentley et al., Nature, November 6, 2008

How different are individuals?

• DNA from a Han Chinese individual was completed.

• About 3 million SNPs were found, 86% had already been found by others.

• About 24% more polymorphism (heterozygosity) than Caucasian genomes.

• There were 2,682 structural variations, including insertions, deletions, and inversions. Many variations in SINES and LINES were found.

Wang et al., Nature, November 6, 2008

How different are cancer cells?

• DNA from skin cells and acute myeloid leukemia cells from the same Caucasian woman were sequenced.

• About 2.9 million SNPs were found in the skin cells, and 3.8 million in the leukemia cells.

• Almost all of the differences in SNPs were found to be common in other sequenced genomes or not in genes.

• Ten genes were found to have acquired mutations in the leukemia cells. Of these, two were known to be involved in tumour progression. The functions of the other eight mutant genes are unknown.

Ley et al., Nature, November 6, 2008

Metabolomics

• A study of 284 males compared 383 metabolic indicators and SNPs (genetic variants).

• Up to 12% of the levels of the metabolic molecules could be explained by particular versions of the gene (SNP).

• Four genes were known to be in metabolic pathways related to the metabolic molecule that was high or low.

Geiger et al., PLOS Genetics. November, 2008

Wooly mammoth

• Over 4 billion bp in genome • Mammoths and African elephants differ in about 1 amino acid per protein • Estimate that Mammoths and African elephant separated 1.5 to 2.0 Million years ago

Nature, November 20, 2008

Wooly mammoth

Recent genome news

Nov 19, 2011 Malaysian Genomics Resource Centre Berhad (MGRC) today announced that it has successfully completed its 100th human genome from a diverse mix of Malaysian, European and Australian individuals. The results of the data generated from these genomes has helped in efforts to identify and compare highly represented patterns of common and clinically-relevant genetic variations within Malaysian and other populations, and to establish robust bioinformatics protocols for the reference-based analysis of genomic information.

Recent genome news

Nov 23, 2011 A study of 11,000 children and adults found that very short people (the lowest 2.5% of the population) are missing more genes or parts of genes than taller people.

Recent genome news

November, 2011 The mythical "$1,000 genome" is almost upon us (in 2012), said Jonathan Rothberg, CEO of sequencing technology company Ion Torrent, at MIT's Emerging Technology conference.

November 2, 2011 Duke University said last week that it will sequence 4,000 individuals as part of a collaborative, $25 million effort to identify as many genes as possible implicated in epilepsy.

Maize (corn) genome

Maize has 10 chromosomes, 2.3 billion base pairs The sequencing was done using clone-by-clone method, with 16,848 BACs sequenced, assembled, and analyzed.

There are estimated to be 32,500 protein encoding genes, and 150 microRNA genes (miRNA).

Approximately 75% of the genome is repeated DNA.

It has over 400 families of LTR retrotransposons with over 31,000 different sequences.

Fig. 1 The maize B73 reference genome (B73 RefGen_v1): Concentric circles show aspects of the genome P. S. Schnable et al., Science 326, 1112-1115 (2009)

1000 Genomes project

The 1000 Genomes Project is an international collaboration to produce an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts. This resource will support genome-wide association studies and other medical research studies.

The genomes of about 2500 unidentified people from about 27 populations around the world will be sequenced using next-generation sequencing Technologies.

Highlights

Over 4.9 trillion nucleotides sequenced Over 800 individuals (179 people had their whole genomes sequenced and 697 people just the protein-coding regions) Each child had around 60 mutations in its genome that did not exist in either parent Over 15 million SNPs discovered each individual is carrying a significant number of deleterious mutations, maybe 250 or 300 genes that have defective copies

1000 Genomes project

http://www.1000genomes.org/home

3 billion

Number of DNA letters in the human genome (200 volumes the size of a Manhattan telephone book, which has around 1,000 pages)

20,000-25,000

Number of genes in the genome (though not all scientists agree)

2000

Year the first draft of the human genome was announced to much

2003

fanfare at the Clinton White House Final draft completed to 99.99% accuracy

2500

Number of people whose genomes the 1,000 Genomes Project hopes to sequence, from 25 populations

15 million

Number of single-letter changes identified in the pilot phase

1 million

Number of small insertions and deletions identified in the pilot phase

4.9 trillion

Number of letters of data sequenced by the 1,000 Genomes Project so far

1094

Genomes completed for 1094 individuals, 6/23/11

Human microbiome

Adults harbor ten times more microbial cells than they have human cells. Examination of how these microbes impact human health through their association with the body, for example by influencing metabolism, disease susceptibility and drug response is key for improving human health.

Through the Comparative Genome Evolution (CGE) program, NHGRI approved a limited project – Sequencing of Cultivable Microbes from Human Gut – to obtain reference genome sequence data from up to 300 cultured bacteria and archea sampled from the human digestive tract and urogenital tract in September 2005. The object is three-fold: to start to generate reference data for future large-scale metagenomics studies; to understand the diversity of bacterial pangenomes, and to start to address the technical and bioinformatic challenges that human metagenomics research will encounter.

From Genome.gov, 11-2010

Scientists propose a "genome zoo" of 10,000 vertebrate species November 03, 2009 By Branwyn Wagman , Guest Writer (831) 459-3077 Scientists involved in the Genome 10K Project are assembling specimens of thousands of animals spanning a broad range of evolutionary diversity. Photos courtesy of San Diego Zoo .

From http://news.ucsc.edu/2009/11/3333.html

10,000 vertebrate genomes

In the most comprehensive study of animal evolution ever attempted, an international consortium of scientists plans to assemble a genomic zoo--a collection of DNA sequences for 10,000 vertebrate species, approximately one for every vertebrate genus.

Known as the Genome 10K Project , it involves gathering specimens of thousands of animals from zoos, museums, and university collections throughout the world, and then sequencing the genome of each species to reveal its complete genetic heritage.

Launched in April 2009 at a three-day meeting at the University of California, Santa Cruz, the project now involves more than 68 scientists. Calling themselves the Genome 10K Community of Scientists (G10KCOS), the group outlined its proposal to create a collection of tissue and DNA specimens for the project in a paper to be published online November 5 in the

Journal of Heredity

.

From http://news.ucsc.edu/2009/11/3333.html