1. Basic processing
36 bp single-end Solexa reads
ran Solexa pipeline using default parameters
maq processing using default parameters
2. Data information
samples
2L: African - MW28, North American - Ral-303A, Ral-304A (MW - Malawi; Ral - Raleigh, NC)
X: African - MW6, North American - Ral-303A, Ral-304A
range of bases (reference assembly 4)
2L: 3Mbp - 4Mbp
X: 4Mbp - 5Mbp
conversion of coordinates from assembly 4 -> 5
These regions were chosen partially because they easily map between assemblies 4 & 5
For 2L, from 3Mbp - 4Mbp the coordinates are exactly the same in v4 and v5
For X, at 4Mbp add 48,796 to v4 coordinates to get v5 coordinates (v4 4,000,000 = v5 4,048,796)
coverage
These genomes were sequenced to 10x coverage
3. Synopsis of cns.view file - from
http://maq.sourceforge.net/maq-manpage.shtml
The MAQ command cns2view was used to create the cns.view files. These files give information for every base in the alignment.
Column1 - chromosome
Column2 - position (wrt to D. melanogaster reference sequence - v4)
Column3 - reference base
Column4 - consensus base
Column5 - Phred-like consensus quality
Column6 - read depth
Column7 - the average number of hits of reads covering this position (this gives a measure of the uniqueness of a region - i.e. did the reads that were aligned here also map well to other locations)
Column8 - the highest mapping quality of the reads covering the position
The following columns are probably not useful and may be incorrect anyway...
Column9 - the minimum consensus quality in the 3bp flanking regions at each side of the site
Column10- the second best call
Column11- log likelihood ratio of the second best and the third best call
Column12- and the third best call.
