Research
EVOLUTIONARY FORCES SHAPING GENOMES
Evolutionary geneticists have long sought to understand how natural selection, drift, and mutation shape genome variation. Historically, these efforts have been data-limited. In Drosophila and vertebrates, sequencing of multiple species genomes makes possible comparative genomics studies over long evolutionary time scales. For Drosophila and yeast, we now have genome sequences from multiple lines or individuals within species; such data will soon be available for humans and other species. These data allow us to use molecular population genetics on a genome-wide scale to understand the relative roles of neutral and adaptive processes over shorter time scales. In this post-genomic age, the development of computational skills and statistical methods to analyze these massive datasets has become paramount. I use both computational and experimental approaches to investigate the patterns and processes of genome evolution within and among species. I address fundamental questions in evolutionary biology, including: * What are the primary evolutionary processes acting on genomic features - e.g. exons, UTRs, protein domains, and gene duplicates? * Are recent bursts of evolution in otherwise highly conserved DNA due to shift in or loss of function? * What role does natural selection play in the evolution of gene expression and which regulatory features are the major determinants of gene expression variation? * Are the targets of adaptive evolution between closely related species non-overlapping because of their unique biology and ecology or more similar and related to particular biological processes? Broad-scale genomic investigations dramatically improve our understanding of evolutionary processes acting on genomic variation, but they also spotlight particular genes or pathways for genetic analyses. In-depth investigations of specific genes pinpoint genetic changes responsible for such phenotypic variation. I will use this two-pronged approach in my research program - first explaining the evolutionary processes underlying genomic patterns and then following up with more focused investigations of smaller sets of genes. Although the direction of my research stems from my recent computational and empirical work in Drosophila population genomics, my dissertation research exploring the evolution of gene duplicates gave me a strong background in evolutionary genetics. In that work, I developed the laboratory skills necessary to ask questions using the genetic tools available in any model system. However, I plan to form collaborations with experts in genetics and molecular biology, which will allow me to focus on the computational side of my research. POPULATION GENOMICS Drosophila simulans Genome Project: Many of my postdoctoral projects have focused on analysis of population genomic data from six inbred lines of the fruit fly, Drosophila simulans, a close relative of D. melanogaster. Working with a group at UC-Davis, I was a primary contributor to the assembly and evolutionary genomic analyses (Begun, Holloway et al. 2007). We assembled and annotated the D. simulans genomic lines using the well-curated genome of D. melanogaster. Together, these data provide us an unbiased (i.e. whole genome) view of the extant variation within D. simulans, divergence between species, and insight into mechanisms by which this variation has been maintained. The D. simulans project produced the first population genomic dataset of a higher eukaryote and remains a relevant source of information for much of my current and future work. Exciting Findings: First, the X chromosome, which is present in a single copy in male flies, evolves faster than the autosomes in both D. melanogaster and D. simulans. We present several hypotheses for the mechanism driving faster X evolution - e.g. recessive beneficial alleles, underdominance, more frequent directional selection on males, higher mutation rates in females, higher mutation rates on the X. Second, we show that amino acid sites and regulatory regions are subject to pervasive adaptive evolution. While this question has been addressed in smaller datasets, our data confirm and extend these analyses to the scale of a whole genome. Third, we provide an analysis of the biological processes and molecular functions that are most frequently targets of adaptive evolution. Researchers have known that genes involved in reproduction and immunity were subject to frequent bouts of directional selection. The novelty of our analysis is not only its genomic scale, but that it is unbiased in the class of genes investigated. We discovered that genes involved in trafficking through the nuclear membrane, chromatin regulation, and transcriptional regulation were among those with the most intense directional selection. These are but a few of the interesting results that emerged from these analyses; below, I describe a few of my own independent projects that have made use of these data. Our hope is being realized that we and other biologists would use this rich genomic resource for years to come. Large-scale resequencing of D. melanogaster populations: One new and exciting collaborative project is the large-scale resequencing of D. melanogaster populations (www.dpgp.org/1K). Working with the group at UC-Davis, we have resequenced 44+ D. melanogaster genomes and have received favorable reviews on an NIH grant to sequence almost 1000 genomes worldwide. These genomes are being sequenced to 10x depth using short-read Solexa/Illumina sequencing and will be aligned to the high quality D. melanogaster reference genome. These data will provide a rich resource for population geneticists and Drosophila biologists in general and will also serve as a test case for large-scale resequencing of human genomes. As in the D. simulans project, I am a primary contributor to the initial analysis of these genomes. The aims of the proposal are to: 1. Generate and analyze genomic data from African, European, and Asian fly populations, 2. Develop hardware and software to overcome the inherent assembly and analysis bottlenecks created by increasing capacity of emerging sequencing platforms, 3. Identify genomic features exhibiting extreme divergence between high and low latitude populations, and 4. Enable research in other Drosophila species. Beyond collaboration on this large project, I have several solo projects planned that will make use of these data. Comparative Population Genomics: One particular direction that I will take is in comparative population genomics of D. melanogaster and D. simulans. One view of natural selection is that the unique history, biology and ecology of even closely related species causes the set of genes that are the targets of selection to be idiosyncratic and thus largely non-overlapping. Alternatively, perhaps at least certain aspects of adaptive evolution are more predictable, with natural selection driving evolution of a subset of proteins in both the D. simulans and D. melanogaster lineages. This question is fundamental because it speaks directly to the nature of selection and how organisms and genomes respond to it. Comparative population genomic data can allow us to begin approaching these questions. We now have population genomic data from both D. melanogaster and D. simulans. I have begun to compare levels of polymorphism over broad scales as well as compare the evolutionary forces acting on genes with particular biological functions in both of these species. COMPARATIVE GENOMICS Lineage specific divergence: Recent sequencing of several additional Drosophila genomes allows for a phylogenetic approach to comparative genomics. One interesting avenue that I have begun to explore involves regions of the genome that have been highly conserved over many millions of years, but have recently experienced acceleration in their rates of evolution along the D. melanogaster branch (Holloway et al., In review at Genome Research). These Melanogaster-Accelerated Regions (MARs) could be targets of natural selection. Alternatively, accelerated rates of evolution could simply be due to reduced constraint. Interestingly, I found that coding regions comprise a significantly larger proportion of accelerated regions than would be expected by chance. Several MARs show evidence of adaptive protein evolution driving this recent divergence; one such gene is the Drosophila homolog of the fragile X mental retardation gene in humans. Surprisingly, for the vast majority of MARs in protein coding regions, significant acceleration in the rate of evolution is due solely to silent substitution. Further analysis indicates that this not due to higher mutation rates, but to selection on silent sites, possibly for differential codon usage or secondary structure of mRNAs. For MARs in coding regions, adaptive evolution appears to have played a large role in driving these lineage specific accelerations. Functional divergence: Several MARs are promising targets for functional investigation of regulatory elements. These recent accelerations could change the pattern of expression, timing of expression, or stability of the mRNA. For example, one MAR is located in the core promoter and 5'UTR of the gooseberry-neuro gene. This gene is a transcription factor involved in early development. In a comparative study of D. melanogaster and D. simulans, I plan to use antibody stains to investigate spatial pattern divergence and qPCR to examine expression level and stability of mRNA. In a reverse genetics approach such as this, I can directly connect phenotypic divergence with evolution at the sequence level. EVOLUTION OF GENE EXPRESSION Phenotypic differences between individuals and species result, in part, from variation in gene expression caused by underlying sequence variation. Thus, a deeper understanding of this relationship is a crucial component of connecting genotype to phenotype and of elucidating the mechanisms of phenotypic evolution. I have been investigating several questions regarding the relationship between mRNA expression variation and sequence polymorphism. Adaptive Evolution of Gene Expression: In the first genome-wide study that united gene expression data with sequence variation within and between species, I combined genomic expression data - analyzed in a phylogenetic context - with whole genome light-shotgun data from six D. simulans lines and reference sequences from D. melanogaster and D. yakuba (Holloway et al., 2007). Because each of these species has a sequenced genome, it is possible to account for sequence divergence in the array analysis, the critical advantage being that expression and sequence evolution are not confounded. DNA polymorphism and divergence data allowed me to test for directional selection on genes and noncoding regions associated with rapid changes in expression. I found that genes that evolved higher levels of expression in D. simulans have experienced adaptive evolution of the associated 3' regulatory regions and amino acid sequence. Concomitantly, genes that evolved higher expression levels are decelerating in their rates of protein evolution, which is in agreement with the finding that highly expressed genes evolve slowly. These results provide a whole genome view of the intimate link between selection acting on a phenotype and associated genic evolution. Expression Variation & Sequence Polymorphism: A second avenue that Ifve been exploring in collaboration with Mara Lawniczak (Imperial College, London) is to link genotypic and phenotypic differences among individuals in order to understand how DNA sequence polymorphism relates to variation in gene expression (Lawniczak, Holloway et al., In review at Genome Biology). Using whole genome expression data and sequence data from D. simulans, we assessed the relationship between expression variation in males and females and nucleotide polymorphism across thousands of loci. By examining sequence polymorphism in gene features, such as UTRs and introns, we find that genes showing greater variation in gene expression between genotypes also have higher levels of sequence polymorphism in coding regions and 3'UTRs. Further research examining the role of the 3'UTR in Drosophila gene expression will determine whether the positive association detected here indicates functional differences that may be acted upon by natural selection. In this vein, we will identify cases of expression variation that map directly onto population subdivision at nearby sequence. We can then investigate the specific underlying genetic polymorphisms that cause expression variation. SUMMARY Population genomics is a major direction of evolutionary and biomedical research. Primates, Drosophila, yeast, and Arabidopsis will have or already have population genomic data available. This escalation of population genomic sequencing will allow us to understand empirically how variation in factors such as population size and mating system determine the role of adaptive and neutral processes acting on genomes. Enormous resources will be devoted to understanding the population genetics and functional variation in humans. These efforts will broadly stimulate the integration of population genomics and functional biology in the context of basic questions in evolution and ecology of all model organisms. My background leaves me in the enviable position of being on the front line in this revolution. | ||
| University of California-Davis | Center for Population Biology | Section of Evolution & Ecology | 3347 Storer Hall Davis, CA 95616 530-754-9551 | akholloway at ucdavis.edu |