In the first post in this series, I introduced the background topic of my research focus, namely the evolution and impacts of genome size diversity in animals. Before moving on to the specific projects that I would most like to do in the near term if I had the funds, I want to discuss the basic philosophical approach that much of my lab’s work follows.
As I noted recently, there is a strong tendency among many biologists to assume that only “hypothesis-driven” science is valid and informative. I disagree with this position very strongly, as I think it causes people to focus on narrow questions and runs a real risk of making most science little more than an exercise in confirming and refining what we already know. Moreover, it is only feasible to structure one’s research in the simple, falsificationist hypothesis-testing format if there is extensive background knowledge available. When working in a new area where little is known, this is not possible.
Does this mean we should be allowed to just stumble around without really testing any ideas? Of course it doesn’t. The alternative is to step back from individual hypotheses and to carry out what I call “targeted exploration”. This means that we do not feel it necessary to formulate our research in the simplistic “Ho, H1” format with a “yes/no” result structure. Instead, we take what information is available and try to identify patterns. If no information is available at all for some area, then we might explore it with the specific purpose of looking for patterns. Once a possible pattern is identified, we determine ways of testing how broadly it holds and what might be causing it. This involves more exploration, but specifically in areas that are intended to provide the necessary data to test the broad pattern. If the pattern holds, then we can formulate even more specific ideas about causation, leading eventually to the testing of particular hypotheses.
Some important points should be noted. First, targeted exploration does not conflict with focused hypothesis testing. Rather, it ultimately feeds into hypothesis-driven research, but is particularly important because it takes us into new territory rather than working within existing areas. Second, it is not done blind. There is a specific reason to target particular areas. Third, as it does not have a simple refuted/supported result but rather can be set up to reveal many different things, the results can be very informative either way. Finally, because it is based on large-scale sampling, exploration of this type has the beneficial side effect of closing some major gaps in our basic knowledge.
Let me give you an example of how this works.
Insects are by far the most diverse group of animals, at least in terms of described species. However, they have traditionally been poorly covered in animal genome size studies. When I was a graduate student, I compiled the Animal Genome Size Database, which made it possible to look across all the data that were available and see what patterns emerged. Based on work in amphibians, it was apparent that species with complex developmental programs including metamorphosis had smaller genomes than species without metamorphosis. I wondered if something similar might apply to insects, given that there are orders with complete metamorphosis (holometabolous development) and orders with incomplete metamorphosis (hemimetabolous development).
That is step 1: ask a question and look for a pattern. The data for insects were very limited, but it did seem as though insects with complete metamorphosis possess smaller genomes than those lacking complete metamorphosis, making this similar to the case in amphibians. However, there were not really enough data to say much about this, so as part of my graduate work I set out to get more insect data. I added a few hundred species, mostly just whatever I could get locally, and doing my best to include species from several orders with and without metamorphosis. That is step 2: assemble a dataset that can at least be used to identify a possible pattern. At this stage, the sampling is somewhat unconstrained — just get whatever you can, with the question still in mind. Why do it like this? Because a) you don’t have enough information to be very specific in what data you need, b) you’re working in a new area, so any data you get will be informative, and c) you don’t know if the pattern you are looking for is really the main pattern, so it is best to sample more widely in case some other pattern shows up.
Here is what I found:
With the exception of one beetle species out of more than 150 (and I still want to check this myself), no insects with complete metamorphosis appear to have genome sizes larger than 2pg (~ 2 billion base pairs). On the other hand, orders without complete metamorphosis often include species with enormous genomes.
So, step 3 is then to see whether this holds with a broader sampling. Now we are getting into the targeted exploration. What we need is a) more data from holometabolous orders (do they exceed this threshold and we just haven’t found them?) and b) more from hemimetabolous orders (do most of them have examples that are larger than the threshold?). Since this possible pattern was identified, we have added hundreds of species from both kinds of insects, including about 400 butterflies and moths (holometabolous, none larger than 2pg), 90 wasps, ants, and bees (holometabolous, none larger than 2pg), 75 flies (holometabolous, none larger than 2pg), and 100 dragonflies (about 1/5 of known diversity in North America; hemimetabolous, a few larger than 2pg). So far, so good, and this work continues with current projects on wasps, flies, caddisflies, and stone flies. But questions remain: Does this hold in additional orders? Is there really a link between development and genome size in insects? Why 2pg? Are there other explanations (e.g., other constraints, phylogenetic effects, differences at the level of mutational mechanisms)?
For step 4, we started to test this idea that development constrains genome size in insects. First, we looked at the rate of development (egg to adult) within a single genus (Drosophila), and found a significant correlation with genome size. We have also started looking at “curious” orders that may be exceptions that prove the rule: for example, mayflies have an additional nymphal moult that other hemimetabolous orders don’t, so this may impose an additional constraint and keep their genomes small — I have only looked at one so far (yes, small), but I will let you know how it turns out once we do a large sample. We are also looking at specific comparisons within orders based on a combination of their traits (developmental rate, parasitic vs free living, body size, flight) and phylogenetic relationships. In this case, shifts in lifestyle are especially informative because they may illustrate an evolutionary association between genome size and the characteristics of interest.
Assuming these patterns hold up and we are convinced that development is linked with genome size, we will want to know how — thus, step 5. The most likely mechanistic bridge between genome size and organism development is cell division. However, no one has looked at cell division rate across insects with different genome sizes. This would be much more difficult than doing large-scale surveys, but it could be focused on a few representative species with different DNA amounts. If we really want to know if DNA content affects cell division, we would need to examine this experimentally in step 6 — for example, by actively adding or removing different amounts of DNA and observing the effects on cell cycle parameters. I have been trying for a few years to get funding to do this (in yeast initially), but no success.
I think it is obvious that this kind of approach falls outside the typical hypothesis-driven focus. However, it does get us from knowing almost nothing in step 1 to formulating and testing specific hypotheses in step 6. Along the way, we have greatly expanded the available dataset, and have revealed several additional patterns worh exploring within some orders. If I had to express each step in the form of hypotheses, I probably could, but because we are exploring so many questions at once in each step, it makes more sense to just think about questions and make sure the sampling will allow us to generate answers. Without the existing knowledge base, focusing on one hypothesis only is premature and very limiting in what it will accomplish.
Obviously, we are not just interested in insects. Over the rest of the series, I will talk about other groups that we are eager to explore, and will discuss in more detail some of the focused work on mechanisms that I am interested in. Some of these therefore begin at step 1, others at step 6, and some somewhere in between.