Publications

Address

Department of Biological Sciences
915 W. State Street
West Lafayette, IN 47907
ph. (765) 494-4408
Fax (765) 494-0876


People

Faculty Directory



PETER WADDELL

Assistant Professor
LILY B-228
494-0455

CV: Link

Below is a description of my research in each of these areas and of where current research interests are heading. An important aspect of my research is to use experimental results combined with theory, to address key questions in evolutionary biology, bioinformatics and gene expression.

Molecular Evolution

The genetic basis of evolution is now accessible via whole genome sequencing. This is also now within the reach of individual labs via massively parallel sequencers such as 454, ABI SOLiD and Solexa machines. Not only is it feasible to sequence a mammalian genome, it is also possible to sequence in depth the transcripts that come from living material; either the whole organism or cell lines. We work with both.
To understand such data, phylogenetic or evolutionary tree based methods are a natural model. I am particularly interested in developing and understanding such methods. Earlier work included the development of the first likelihood methods to model unequal rates across sites (e.g., Steel et al. 1993, Waddell and Penny 1996).
An ongoing interest is in robust ways to assess fit of data to model and to take into account both model and sampling uncertainty (e.g. Waddell et al. 2002, Waddell 2006). Another related theme is to explore patterns of coevolution amongst genes (Waddell et al. 2007).

Mammalian Comparative Genomics

There remain unresolved issues of the phylogeny, divergence times and ancestral population genetics of the deepest parts of the placental mammal tree. My work with colleagues in Japan erected the first predominantly correct classification and phylogeny of the placental mammals. This included the first identification of the four main groups of placental mammals Laurasiatheria, Euarchontoglires or Supraprimates, Afrotheria and Xenathra (Waddell et al. 1999, 2001) and the relationships between them. The sequence data for mammals typically do not fit phylogenetic models well, and consequently the trees are often wrong, even when using millions of base pairs. Thus, in order to test our hypotheses of mammalian evolution, we developed a number of tests based on more conservative characters such as SINE/LINE insertions (Waddell et al. 2001).
There remains considerable uncertainty as to which characters are most reliable in inferring/testing deep relationships (e.g., Swofford et al. 1996, Waddell and Shelly 2003) and current research includes examining and comparing different types of slowly evolving data. Equally uncertain are the exact divergence times and how to best model the process of rate change and integrate fossil evidence (e.g., Waddell and Penny 1996, Kitazoe et al. 2007).
Another aspect of our work in mammalian genomics is guiding a number of genome projects. One of our aims has been to achieve a far higher standard of biological materials to offset the problems with the genomes NIH/NHGRI have recently been producing. This includes collecting RNA from all tissues of the animal's body, establishing diverse cell lines, and the deposition of voucher specimens.
We bring into culture cell lines from diverse mammal orders to test predictions from bioinformatics. Presently, we focus on fibroblasts, aiming to keep them as close to wildtype or unmutated as possible. We have a strong interest in pleuripotent cells from fibroblasts, as these will open up many further aspects of testing hypotheses derived from studying genome sequences.

Gene Expression

To understand a genome from a dynamic perspective requires both understanding where things came from (their evolution) and also how all the parts work together. The genome exerts its influence via gene expression. This is not only though the classically appreciated protein coding genes, but also a potentially vast array of both large and small RNA molecules. These small molecules include micro RNA's (miRNA) which are short (17-25bp) pieces of RNA cut out of larger precursors. We are presently mass sequencing these using Solexa and SOLiD technology.
We are very interested in the evolution of the expression of cancer genes and in particular, TP53, which is probably the single most important cancer regulatory protein. Testament to the importance of this gene is that it is mutated or modified in over 75% of all cancers. We are interested in what the main regulatory elements of TP53 are and how they have changed over the course of mammalian evolution.
A major question we hope to address is how population genetics/life style and ecological factors such as longevity and diet interact with factors such as propensity towards cancer and how the molecular machinery evolves to address such dramatic lifestyle changes. Mammals are an excellent system to study this, not least since many lineages have independently evolved from small (<1kg) short-lived (1-2 years) animals to large (> 10kg) animals with long life spans (10-100 years).

Bioinformatics/ Computational Biology

Bioinformatics means different things to different people, but phylogenetics is one core area. It sometimes involves elements of computer science such as the number of operations required to compute a problem. An example of this is least squares fitting of distances to trees, and interesting area with close parallels to regression analysis. One result with David Bryant was to describe algorithms that are time optimal (Bryant and Waddell 1998).
A separate aspect of bioinformatics I have researched is gene expression analysis, including work for private industry. Joint research with Hirohisa Kishino developed a number of important methods for the analysis of microarray data. One of these is correspondence analysis, which was first used for expression data by Kishino and Waddell (2000). This paper also showed how partial correlations, the basis of graphical modeling, could be robustly estimated when the number of observations was less than the number of variables.
Another important technique in expression analysis is graphical modeling, which was first used on this type of data by Waddell and Kishino (2000). A surprising and cautionary result of these analyses was that while graphical modeling uses fit statistics, biological data sets that clearly could not fit the model, were not rejected by this likelihood ratio statistic. Another part of the paper looks at the meta-analysis of gene expression clustering and importance of considering a wide variety of distance and clustering methods (in contrast to the popular approach at the time of basing everything on a single UPGMA analysis). Another example of our developing novel methods was the application of correspondence analysis to gene expression data, recognized by the Japanese Society of Bioinformatics with the best paper award at the major conference, GIW 2001.

Statistical Genetics

Statistics is essential to understanding biological data, including DNA sequences and gene expression. I am interested in developing tests that more reliably show expected errors for inferences from data. One of the major problems in analyzing genomic data is that our models are too simple and often do not fit the data very well. If doing a Bayesian analysis, for example, this results in credibility intervals that are typically far too narrow (Waddell et al., 2001). It is in situations like this that resampling techniques such as the bootstrap have utility (Waddell et al. 2002). However, these too, as typically applied to phylogentic data, for example, result in credibility intervals that rapidly go towards zero, as more data is added. They are recording one type of stochastic error but are ignoring potentially huge systematic errors. The field needs techniques that monitor both types of error and give us at least a semi-realistic estimate of the scale of the problems. Mammalian phylogenetics is littered with estimates of the phylogeny which not only contain errors, but also give extremely inaccurate and misleading estimates of their own accuracy.

The lab is currently actively recruiting at the graduate and postdoc level, in both computational and molecular biology.

Education

B.S./M.S University of Auckland 1990
Ph.D. Massey University 1996

Professional Faculty Research

I develop methods of analysis/algorithms, work on their implementation, apply them to data and gather data relevant to particular problems. My lab is both wet lab, focusing on RNA/DNA extraction, cell culture plus mass sequencing, and dry lab, focusing on computer analysis of biological data, especially genomic data.