Introduction
The complete description of human genes is providing deep insights into human biology. Similarly, the description of human genomic variation will play a central role in our attempts to understand the genetic basis of variation in disease risk between individuals. Publicly available human and model organism reference genomic sequences provide a foundation for revolutionary advances in population genomics - the genomic-scale detection of DNA sequence variation and the investigation of the mechanisms through which these genomic polymorphisms affect disease risk. However, even as the first population genomic data emerge, solid paradigms of description and analysis have not yet appeared. Fundamental questions concerning patterns of linkage disequilibrium and their implication in disease gene mapping are unanswered; experimental approaches are still fluid. The value and application of outgroup reference sequences (chimp, gibbon and/or macaque) is just beginning to be explored. The integration of population genomic polymorphism and associated phenotypic variation into the present and “next generation” genomic annotations is an essential, yet daunting task. To directly address the basic challenge of population genomics the DPGP is an effort to obtain the complete sequence of 50 Drosophila melanogaster genomes. These publicly available Drosophila strains and their associated genomic polymorphisms will become objects of intense and diverse functional analyses and annotation. This unique resource will support the development of new methods and concepts in population genomics. The goals of this proposal are (1) to develop and validate an appropriate resequencing technology, (2) to establish a sustainable, high quality resequencing capacity, and (3) to provide preliminary results and analyses that demonstrate the great value of genomic coverage of population polymorphism.
Participants
-
University of California - Davis (UCD)
David Begun (coPI) - djbegun_at_ucdavis.edu
John H. Gillespie (coPI) - jhgillespie_at_ucdavis.edu
Charles H. Langley (PI) - chlangley_at_ucdavis.edu
Cathy Laurie (coPI) - cathylaurie_at_pqgen.com
Kristian Stevens (Bioinformatics) - kristian_stevens_at_sbcglobal.net
-
Emory University (EU)
Michael E. Zwick (coPI) - mzwick_at_mac.com
-
Johns Hopkins University (JHU)
David Cutler (coPI) - dcutler_at_jhmi.edu
-
Children's Hospital Oakland Research Institute (CHORI)
Pieter de Jong (coPI) - pdejong_at_mail.chori.org
Kazutoyo Osoegawa (Research Associate) kosoegawa_at_chori.org
-
Affymetrix, Inc.
Janet Warrington (coPI)
-
David Begun (coPI) - djbegun_at_ucdavis.edu
John H. Gillespie (coPI) - jhgillespie_at_ucdavis.edu
Charles H. Langley (PI) - chlangley_at_ucdavis.edu
Cathy Laurie (coPI) - cathylaurie_at_pqgen.com
Kristian Stevens (Bioinformatics) - kristian_stevens_at_sbcglobal.net
-
Michael E. Zwick (coPI) - mzwick_at_mac.com
-
David Cutler (coPI) - dcutler_at_jhmi.edu
-
Pieter de Jong (coPI) - pdejong_at_mail.chori.org
Kazutoyo Osoegawa (Research Associate) kosoegawa_at_chori.org
-
Janet Warrington (coPI)
Specific Aims
I. Identify all the common polymorphisms in two large regions among 50
Drosophila melanogaster genomes.
A. Two well-annotated genomic segments (4.3 Mb of zeste – white on the
X, and 3 Mb of the Adh region on an autosome) will be resequenced by
hybridization of cloned and/or amplified genomic “target DNAs” to
custom designed oligonucleotide microarrays over a three year period
(see Figure 10).
B. The capacities to generate the target DNAs for
increasingly larger regions (300 kb, 1 Mb, and 6 Mb) by both Long PCR
(LPCR) and fosmid cloning will be developed, demonstrated, compared and
evaluated.
C. Independent sequence determination (via chips and gels)
in a significant portion of genomic segments will drive increased
accuracy, more confident assessment of data quality, and provide a
dataset for the further development of analytic algorithms (e.g.
ABACUS).
II. Develop target DNA preparation, hybridization and scanning
tools, and data production pipeline to support the throughput needed to
resequence 50 entire genomes.
A. The most effective approach for
large-scale, automated production of target DNA (e.g., LPCR or fosmids)
will be determined.
B. Finalized protocols for the fragmentation,
labeling, hybridization and scanning of target DNAs, as well as
instrumentation and software appropriate for a resequencing production
scale of 50 x 110 Mb, as envisioned for the out years.
C. The capacity to collect, analyze and disseminate 50 x 110 Mb of
resequence data will be established.
III. Develop methodological resources for the analysis
of genomic variation in natural populations and its functional role in
the inheritance of complex traits.
A. Analytic and statistical tools
that incorporate various aspects of data quality for population genomic
data will be developed.
B. Low-level software tools will be developed
to facilitate research community access to the resequencing data.
C. Tools will be developed to leverage the enormous power of outgroup data
(the soon to be available complete D. simulans and D. yakuba reference
sequence), combined with genomic polymorphism data, allowing genomic
scale extensions to numerous “classical” population genetics tools.
D. The incorporation of a broad variety of functional annotation into
population genomic analysis will be explored.
E. The value of complete
genomic polymorphism data in the analysis of complex traits will be
explored and documented.
F. The integration of such large amounts of
polymorphism data into public annotation will be explored in
collaboration with Drosophila genome annotators and Flybase.
A. Two well-annotated genomic segments (4.3 Mb of zeste – white on the X, and 3 Mb of the Adh region on an autosome) will be resequenced by hybridization of cloned and/or amplified genomic “target DNAs” to custom designed oligonucleotide microarrays over a three year period (see Figure 10).
B. The capacities to generate the target DNAs for increasingly larger regions (300 kb, 1 Mb, and 6 Mb) by both Long PCR (LPCR) and fosmid cloning will be developed, demonstrated, compared and evaluated.
C. Independent sequence determination (via chips and gels) in a significant portion of genomic segments will drive increased accuracy, more confident assessment of data quality, and provide a dataset for the further development of analytic algorithms (e.g. ABACUS).
II. Develop target DNA preparation, hybridization and scanning tools, and data production pipeline to support the throughput needed to resequence 50 entire genomes.
A. The most effective approach for large-scale, automated production of target DNA (e.g., LPCR or fosmids) will be determined.
B. Finalized protocols for the fragmentation, labeling, hybridization and scanning of target DNAs, as well as instrumentation and software appropriate for a resequencing production scale of 50 x 110 Mb, as envisioned for the out years.
C. The capacity to collect, analyze and disseminate 50 x 110 Mb of resequence data will be established.
III. Develop methodological resources for the analysis of genomic variation in natural populations and its functional role in the inheritance of complex traits.
-
A. Analytic and statistical tools
that incorporate various aspects of data quality for population genomic
data will be developed.
B. Low-level software tools will be developed to facilitate research community access to the resequencing data.
C. Tools will be developed to leverage the enormous power of outgroup data (the soon to be available complete D. simulans and D. yakuba reference sequence), combined with genomic polymorphism data, allowing genomic scale extensions to numerous “classical” population genetics tools.
D. The incorporation of a broad variety of functional annotation into population genomic analysis will be explored.
E. The value of complete genomic polymorphism data in the analysis of complex traits will be explored and documented.
F. The integration of such large amounts of polymorphism data into public annotation will be explored in collaboration with Drosophila genome annotators and Flybase.
All data emanating from this project at each stage will be publicly released daily.
All software produced by the participants as part of DPGP will be open-source.
Timeline
Stocks
From Trudy Mackay (NCSU) - Trudy_Mackay_at_ncsu.edu and Richard Lyman (NCSU):
40 inbred lines derived by sib mating (12 generations) from D. melanogaster females collected in 2003 in Raleigh, NC.
From Shu Fang (Academia Sinica) - zofang_at_gate.sinica.edu.tw:
Isochromosomal lines derived by balance extraction from 10 isofemale lines collected by J.W.O. Ballard (Iowa U.) in Malawi, Africa in 2002.
News
- The Scientific Advisory Council of NHGRI/NIH recommended funding of this project at its February, 2004 . Francis Collins (NHGRI) announced at this years Drosophila Meeting that funding will begin in this fiscal year (2004).
