|
50 D. melanogaster Genomes Project ObjectivesThe specific goals of the 50 Genomes Project (HG 02942-10A1) were twofold:
In early 2007 it was clear that developing short read sequencing technologies, specifically using Illumina's Solexa Genome Analyzers, was the most economical, efficient, and robust route to completing our short-term goals. We are now well on our way to completing the initial long-term goal (50 entire genomes) of the Drosophila Population Genomics Project (DPGP) within this last year of the original funding cycle. Total Coverage as of Feb 2008 = 320X This success reflects the technology advances in ultra-high-throughput resequencing. More importantly, our recent experience and the growing potential of new sequencing platforms has inspired another ambitious proposal - resequencing of 1000 Drosophila genomes. By providing the research community with deep population sampling based on the high-throughput platforms, our recently proposed project will foster the development of new theoretical ideas, talent, and tools. These can be leveraged against the talent and creativity of the Drosophila research community to advance the application of these new technologies to human population genomics. An isogenic (or inbred) Drosophila melanogaster genome sequenced to 10X with the Illumina GA has a low rate of missing data. We are routinely achieving 98% or higher coverage of the non-repetitive genome at Q40 or higher for such isogenic genomes (go to figure) with a single run of the instrument. Data AvailabilityOne major goal will be to utilize public database resources as much as possible to disseminate our sequences in a timely manner. We are currently arranging with NCBI (Martin Shumway) to transmit our present production to the NCBI's short read trace archive: Along with the large human genomic resequencing community, we are working on creating solid and serviceable genomic assemblies and associated publications. As these assemblies emerge, we will post them on this website and submit them to public databases. There are many interesting critical developments to be made at this early stage and we invite collaborations. For those interested in using these sequence reads in this early phase, please contact us. We are also prepared to provide copies of our original data (including the GA images).
50 Genomes - Release 0.5The README and access to the data are HERE
The Next 1000 Genomes
The primary broad resource goal (deliverable to the community) in the next phase of DPGP is a proposed 600 high quality Drosophila melanogaster resequenced genomes and associated stocks. Such deep sampling from populations in Africa (species origin) and Eurasia (the ancient diaspora) will give a complete view of genomic variation in this species including vast numbers of rare variants of many different kinds (e.g. structural variants) throughout the genome. This resource will, of course, be available for the creative and powerful functional studies possible in this model organism. Like rare variants, less frequent (perhaps larger scale) haplotypes will become discernible in these larger samples and can be important in the investigation of natural selection and historical breeding structure. More generally, the large data set will drive the development of ideas, tools and talent that can augment the value and impact of such approaches in other species, especially in the ongoing large-scale human population genomic project. The second aim of the proposed resource will be the systematic description of genomic differentiation between temperate and tropical populations of D. melanogaster on several continents (effectively 300 high quality genome sequences). The goal here is to identify genomic regions that are candidates for selective adaptation on environmental gradients associated with latitude. The main impact of this resource will be detailed annotation of the D. melanogaster genome and the fostering of experimental population genomic approaches to the systematic identification of adaptations. The third aim of the proposed resource would enable population genomic research in seven non-model Drosophila species pairs, pseudoobscura-persimilis, yakuba-santomea, virilis-americana, pseudoobscura-miranda, ananassae-bipectinata and simulans-sechellia/mauritiana. In each case there is a reference sequence upon which a syntenic assembly can be founded. Five to 30 resequenced genomes (effectively 100X D. melanogaster genomes) are proposed for each of these smaller research communities to address their well-focused questions that include speciation, neo-sex chromosome evolution, recent evolution to a specialized (e.g. toxic) host, and sexual selection. The consideration and decisions about the specific genomes to be sequenced in all these project will be vetted with an advisory committee of the interested colleagues, which will be assembled once the project is initiated. Illumina Genome Analyzer InfrastructureThe following figure shows the DPGP IT infrastructure created for sequencing the 50 genomes with Illumina Genome Analyzers:
The following figure shows the current process for sequencing and assembling a genome with the Illumina Genome Analyzer short reads results and MAQ:
|