A large set of independently sampled and sequenced genomes from a population in the center of the distribution of D. melanogaster that is also as free as possible from the theoretical encumbrance of admixture and other demographic variation such as bottlenecks and large scale migrations will have great value because of it correspondence to the assumption of the most elaborated and predictive of population genomics theory. Based on the analyses of the extensive sampling of African D. melanogaster genomic variation (Pool, et al., 2012), the large sample (>290) from Siavonga, Zambia was chosen as the target population. Multiple haploid embryos were collected from each independent isofemale line. From at least one haploid embryo from each isofemale line the genome was amplified, QCed and converted into a standard Illumina short-insert library. 194 of these have been sequenced to >30X coverage (Illumina GA IIx paired end sequencing) and became available on the SRA at ncbi.nih.gov in September of 2013 (SRA identifier: SRP006733). 90 libraries (genomes) remain to be sequenced.
In a collaborative effort, data from the 197 sequenced Siavonga genomes and other African strains from DPGP2 (SRA identifier: SRP005599) and DPGP1 (SRA identifier: PRJNA30091), are a source for the Drosophila Genome Nexus Project in creating a more comprehensive set of assemblies of D. melanogaster genomes from natural populations using a single assembly approach and pipeline (the Drosophila Genome Nexus website for some details). A core idea of the Nexus pipeline, is to implement multiple rounds of read alignment, variant determination, and reference sequnece modification. We are making available both the DPGP2 and DPGP3 consensus sequences obtained from the Nexus pipeline here. Researchers should carefully evaluate the suitability of this data set for their analyses.
Citation: Users of the DPGP genomes should always cite the original publications that described them. Until a publication is available for the DPGP3 genomes - this web page (www.dpgp.org/dpgp3) can be cited. Large scale analyses may be coordinated by contacting John Pool and Chuck Langley
The Nexus V1.0 data packages:
* Consensus sequence files for DPGP3 (All Siavonga strains) and DPGP2 (remaining DPGP2 and DPGP1 African strains). Note: these are essentially FastA files following the standard Flybase release 5 reference base numbering, but lacking a header line.
* Some users may wish to consult the indel sequences called in round1 and round2. These are also given in the same standard reference coordinates.
* Also provided are annotated resions of identical-by-descent (IBD): ibd_filter_tracts.txt and ancestry/admixture: admixture_filter_tracts.txt; Filtering these regions from analyses will often be important. Scripts specifically written to filter this sequence format are provided admixutre_mask_seq.pl, ibd_mask_seq.pl