Brand new DAP gives Mattick a run for his money.

Check out this brand new dog’s ass plot (DAP). Note that the only labels on the x-axis are the genomes of the different species — yet the points are connected on the line as though there was an actual transition between these modern species.

Wow. Just... wow. Source:

Wow. Just… wow.

How does this sort of thing get past peer review? How do figures like these get drawn in the first place?

Human males are bigger than females because…

leafs5In humans, males tend on average to be larger, to mature later, and to age and die sooner than females. It’s easy to assume, as many people do, that this difference between males and females — what biologists call sexual dimorphism — is the result of sexual selection. That is, males are larger because they fight each other over access to females and/or females prefer to mate with larger males. That is a valid hypothesis, of course — but too often it is simply accepted as fact and then used as a starting point for discussing other male and female traits. As a recent example, consider the debate that is happening on the blogosphere about sexual dimorphism in humans, and whether objections to the standard explanation of fightin’ males is merely political.

It started with PZ Myers’s post “Keep your biological reductionism off us men, too“, to which Jerry Coyne responded with “The ideological opposition to biological truth“. In response to Coyne, Holly Dunsworth provided a critique via blog and Twitter, the latter of which is summarized here. UPDATE: Coyne responds here and here.

I will avoid the political aspect of this discussion and focus on the science involved in the debate, because I think it highlights an important issue: namely, the need for evolutionary biologists to consider and test alternative hypotheses, even if they are not as intuitively plausible as the main hypothesis. This is one of the reasons that evolutionary biologists often take issue with claims from evolutionary psychology — because evo psych often tends to present a plausible hypothesis but does little to critically evaluate its underlying assumptions and even less to present and rule out alternatives. In particular, evolutionary biologists should know better than to restrict the list of hypotheses ones based on selection, because there are usually viable non-adaptive hypotheses as well. Natural selection is not the only mechanism of evolution.

So then, how should an evolutionary biologist approach the example of sexual dimorphism in humans? What alternative hypotheses could there be to the standard explanation? Well, here are some comments that I once wrote up for a student who made the typical “males are bigger/mature more slowly/die sooner because they fight over females” assumption.

Three male characters are listed: 1) larger size, 2) slower development, 3) faster senescence. One hypothesis, direct intrasexual selection (i.e., male-male combat), lumps them all together and explains all three with natural (sexual) selection acting on that trait. That is one possibility. It is also four testable hypotheses rolled into one (i.e., that the three traits evolved under selection related to mating success, plus that they are all part of the same adaptation).

Alternative view #1 is that these are adaptations but they’re decoupled and evolved separately. For example, maybe large size evolved due to sexual selection for male-male combat, but slower development evolved to provide time to learn about hunting. Alternative view #2 is that they evolved together but one or more is a byproduct of another. For example, slow development is simply necessary in order to reach larger size with larger muscle mass and is not in itself adaptive (i.e., if it were possible to become large quickly, this would evolve – indeed, there are good theoretical reasons to expect fast time to first reproduction to be strongly favoured, all else being equal). Alternative #3 is that they are decoupled and not all of them are adaptations. The point is that although these traits are all consistent with the hypothesis of male-male combat, it is not a given that they are coupled in their evolutionary history nor that all three are the product of selection.

Next, we would need to consider as many plausible explanations as possible for each of the three traits.

1) Larger size in males:

Adaptive hypotheses type 1: Males selected to become larger
– Male-male combat (direct intrasexual selection)
– Mate guarding (indirect intrasexual selection) and/or dominating reproductively active females (inter-sexual conflict)
– Female choice (intersexual selection), either direct (good genes if a male can manage to become large) or indirect (large males can provision offspring with food – maybe demonstrated by providing food and other items to females, which would certainly be consistent with modern human behaviour)
– Males hunt large/strong/fast prey and females don’t, so males need to be larger/stronger/faster (ecological selection)
– Males but not females go to war frequently with other tribes over access to food and other resources (group selection or kin selection, as you like)

Adaptive hypotheses type 2: Females selected to become smaller
– Smaller body size means reduced requirements for food, more can be invested in offspring
– Smaller body size can be achieved more quickly during development, allowing reproduction to get underway sooner
– Smaller body size may have been preferred by males as a secondary sexual characteristic

Non-adaptive hypotheses:
– Smaller body size may be a byproduct of other changes in developmental timing in females

2) Slower development in males:

Adaptive hypotheses type 1: Males selected to develop slowly
– Adaptive because it provides time to develop the size and skills needed to face competition with other males
– Adaptive because it provides time to develop the size and skills needed to hunt large prey
– Adaptive because it provides time to develop the size and skills needed to fight rival tribes

Adaptive hypotheses type 2: Females selected to develop more quickly
– Faster development is adaptive because it means earlier age at first reproduction (leading to higher potential lifetime reproductive output)

Non-adaptive hypothesis:
– Getting larger simply takes longer / being smaller takes less time (if growth rates are similar in males and females)

One might argue that the non-adaptive explanation is the simplest, so we would need evidence for why we should reject it and add a more complex explanation in its place Then you would need to test the primary adaptive hypothesis and the alternative adaptive explanations.

3) Faster senescence in males:

Adaptive hypotheses:
– Adaptive, in the sense that male mortality due to combat is high so there is investment in early reproduction rather than anti-senescence, whereas the opposite occurs for females
– Adaptive, in that males consume an inordinate amount of food such that earlier death of males favoured under inclusive fitness
– Adaptive, in that males are selected to reproduce as much as possible while young regardless of whether most offspring survive whereas females are selected to invest in a few offspring over the long term and to see them through to adulthood (r- vs. K-selection)

Non-adaptive hypotheses:
– Non-adaptive – males have higher metabolic rates than females, which means more oxidative damage
– Non-adaptive – testosterone, which contributes to large size, aggression, and sperm production, has the long-term effect of hastening senescence but this is not selected against because it increases reproductive success earlier in life (antagonistic pleiotropy)

Again, I think the non-adaptive explanation is the simplest and indeed there is evidence that testosterone has this effect (castrated males live longer, for example). So again, there would need to be evidence against this as well as evidence for a particular adaptive explanation as well as evidence against alternative adaptive explanations.

Add onto all of this the question as to why sexual dimorphism is not more pronounced in humans than it is, especially if it is thought to have evolved after we became non-arboreal. Are there developmental or genetic limitations (constraints)? Or stabilizing selection acting on another trait?

Finally, there may have been a reduction in overall sexual dimorphism during hominin evolution from Australopithecus to Homo. This would mean that stronger sexual selection (and/or other) pressures existed long before modern humans evolved. Fossil data are limited on this, but it’s clearly a relevant issue.

DNA: The Code for Making Life (BBC World Service — The Forum)


Bridget Kendall and guests explore the current understanding of how DNA works, why it needs constant repair in every living organism and how new DNA-altering techniques can help cure some medical conditions. Joining Bridget are Swedish Nobel Laureate and Francis Crick Institute Emeritus Group Leader Tomas Lindahl who pioneered DNA repair studies, medical researcher Niels Geijsen from the Hubrecht Institute who works on curing diseases caused by faulty inherited genes, evolutionary biologist T Ryan Gregory from Guelph University who asks why an onion has 5 times as much DNA as a human, and Oxford University’s bio-archaeologist Greger Larson whose research suggests that dogs were independently domesticated twice, on different continents.

The curious case of the tardigrade genome.

There has been a lot of interest in tardigrades (aka “water bears”) recently. Not just because they’re very cool, but because they seem to have some very curious genomes. Maybe.


See, in a paper published in PNAS on November 23rd, Boothby et al. (2015) reported evidence of “extensive horizontal gene transfer” in the genome sequence of the tardigrade Hypsibius dujardini. As was widely reported in the science press, this included a lot of foreign DNA in the tardigrade genome, including from very distantly related taxa — plants, fungi, bacteria, you name it. (You may recall that there were initially some claims to this effect about the human genome as well, which did not stand up to subsequent scrutiny).

Boothby et al. (2015) summarized their findings as follows:

Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes.

But just today, a preprint made available in BioRxiv by Koutsovoulos et al. (2015) presented a very different analysis of the H. dujardini genome:

Tardigrades are meiofaunal ecdysozoans and are key to understanding the origins of Arthropoda. We present the genome of the tardigrade Hypsibius dujardini, assembled from Illumina paired and mate-pair data. While the raw data indicated extensive contamination with bacteria, presumably from the gut or surface of the animals, careful cleaning generated a clean tardigrade dataset for assembly. We also generated an expressed sequence tag dataset, a Sanger genome survey dataset and used these and Illumina RNA-Seq data for assembly validation and gene prediction. The genome assembly is ~130 Mb in span, has an N50 length of over 50 kb, and an N90 length of 6 kb. We predict 23,031 protein-coding genes in the genome, which is available in a dedicated genome browser at We compare our assembly to a recently published one for the same species and do not find support for massive horizontal gene transfer. Additional analyses of the genome are ongoing.

So, which report is correct? Is the tardigrade genome packed with foreign DNA, or is this likely the result of contamination? I don’t have an answer, but here’s another little curiosity to add to the mix. My lab had previously provided a genome size estimate for this species using both flow cytometry and Feulgen image analysis densitometry, which came out to 1C = 75Mbp (Gabriel et al. 2007).

Here’s the output from the flow cytometry estimate:

12-2-2015 1-07-26 AM

And here you can literally see that there is less DNA in the somatic nuclei of the tardigrade (HD) than in haemocyte nuclei from Drosophila melanogaster (DM) (1C = 175Mbp):

12-2-2015 1-07-57 AM

This estimate differed substantially from what the sequence was indicating, namely a genome more than 200Mbp in length. Given this discrepancy, we were asked to run some more samples of H. dujardini. We used both methods again (albeit with different equipment, as this was a number of years after the initial estimates were done), and got a similar though higher estimate of about 1C = 100Mbp.


In other words, the sequence seemed to contain twice as much DNA as what we were estimating to be in the nucleus using flow cytometry and image analysis densitometry. We figured that it’s possible that this species undergoes chromatin diminution (deletion of a substantial quantity of DNA during somatic differentiation, such that there is more DNA in the germline than in somatic cells), although this hadn’t previously been documented in tardigrades. Beyond that, it remained a mystery.

It is notable, though, that Koutsovoulos et al. (2015) also obtained an estimate of genome size using flow cytometry (with a different fluorochrome and different standards), and their value was very close to our revised estimate, at 1C = 110Mbp. Their sequence length was also much shorter than in the Boothby et al. (2015) paper, at about 135Mbp.

Boothby et al. (2015) also noted that it is difficult to obtain uncontaminated material from these tiny organisms, and that DNA from their food (bacteria and algae) can end up in even carefully-prepared samples. It therefore seems possible that the large discrepancy between sequence and flow cytometric/densitometric genome size estimates reflects this issue.

Regardless of whether their genomes turn out to be very weird or not, tardigrades are still cool.

New podcast on scientific writing coming soon!

I am currently developing a new podcast that will focus on scientific writing, including tips, strategies, interviews, discussions, Q&A, and other good stuff. If you are a student, postdoc, or even a seasoned writer, there should be something for you! If you would like to be notified when the podcast is set to launch, you can sign up here:

On the Write Path


Quotes of Interest: Crick (1959).

As you all know, Francis Crick was a co-author of the Nobel Prize-winning work on the structure of the DNA molecule, which was first published in 1953. He also played a major role in the subsequent deciphering of the genetic code (with a key study published in 1961), among other important contributions made throughout his career. Notably, he co-authored one of the highly influential “selfish DNA” papers in 1980, which is so often cited as an example of non-genic DNA being dismissed as useless.  As I have noted elsewhere, Crick did not dismiss non-coding DNA as useless junk in that paper, but that hasn’t stopped people from (mis-)citing it as such.

In June 1959, two years before publishing the classic Crick, Brenner, et al. experiment (1961) and three years before he was awarded the Nobel Prize (1962), Crick took part in a symposium entitled “Structure and Function of Genetic Elements” at the Brookhaven National Laboratory. His contribution was entitled “The present position of the coding problem”.  One of the unexplained observations that he outlined at the time involved a disparity in the base composition (the relative abundance of A, C, G, and T) among species. As he wrote:

“This large variation of DNA composition is very unexpected. The abundance of the various amino acids does not, as far as we know, vary much from organism to organism; leucine is always common, methionine usually rather rare. The small variation of RNA composition is exactly what might be expected; the large variation reported for DNA needs some special explanation”.

In other words, although the amino acids and their relative abundances are quite similar across different species, the DNA in their genomes as a whole differs markedly in base composition. If the genetic code is universal — i.e., the same codons specify the same amino acids across species — then why would base composition be so divergent among taxa?

Crick went on to list six possible explanations, though as he put it, “in my view they all, at the moment, appear unattractive”. The first one is particularly informative with regard to the myth that geneticists remained stubbornly closed-minded about the possibility of function in non-coding DNA until rather recently.

Here is what Crick (1959) said about the possibility that “only part of the DNA codes protein” (emphasis added):

“[Under this possible explanation] it is postulated that the sequences of bases in a DNA molecule are of two types: one makes ‘sense,’ that is, codes an amino acid sequence; the other makes ‘nonsense,’ that is, has some other function. The difficulty of this idea is that the nonsense must make up a rather large fraction of the DNA. If, for example, it is assumed that the base composition of he sense is reflected in that of the total RNA of the organism, then organisms showing extreme base ratios must have a minimum of 35% nonsense in their DNA.

If nonsense exists it can be asked how, in one molecule of DNA, the sense and nonsense are interdispersed. Are they coarsely or finely dispersed? As an example of the former, consider what might happen if dud genes could not be eliminated by genetic deletion. The base composition of such genes might well drift to extreme values because of mutagenic bias within the cell. This explanation is not very likely, and in addition demands that dud genes be reasonable uniformly distributed among DNA molecules.

A possible reason for the fine dispersion of nonsense might be the provision of ‘commas’. For example, these might take the form of segments that could pair by twisting back on themselves when the two chains of the DNA were separated. The base pairs of these regions could vary without altering their function. Alternatively a short sequence of bases, different from species to species but always the same in any one species, might act as a comma.”


Part of the Quotes of interest series.


Molecular Plant publishes very confusing paper about junk DNA.

UPDATE: The authors of this paper were rather upset by my initial description of it, so I will just say that I found it very confusing and leave it at that. See the abstract below and check out the paper if you are interested.

Freeling, M.W., Xu, J., Woodhouse, M., and Lisch, D.R. (2015). A solution to the C-value paradox and the function of junk DNA: the Genome Balance Hypothesis. Molecular Plant, in press.

The Genome Balance Hypothesis originated from a recent study that provided a mechanism for the phenomenon of genome dominance in ancient polyploids: unique 24nt RNA coverage near genes is greater in genes on the recessive subgenome irrespective of differences in gene expression. 24nt RNAs target transposons. Transposon position effects are now hypothesized to balance the expression of networked genes and provide spring-like tension between pericentromeric heterochromatin and microtubules. The balance (coordination) of gene expression and centromere movement are under selection. Our hypothesis states that this balance can be maintained by many or few transposons about equally well. We explain known, balanced distributions of junk DNA within genomes, and between subgenomes in allopolyploids (and our hypothesis passes “the onion test” for any so-called solution to the C-value paradox). Importantly, when the allotetraploid maize chromosomes delete redundant genes, their nearby transposons are also lost; this result is explained if transposons near genes function. The Genome Balance Hypothesis is hypothetical because the position effect mechanisms implicated are not proved to apply to all junk DNA, and the continuous nature of the centromeric and gene position effects have not yet been studied as a single phenomenon.

3-10-2015 9-46-56 AM

Click for larger image.

This one has it all!

This recent genome paper has it all: “reveal“, “insights“, and the platypus fallacy.

12-20-2014 5-12-48 PM

Quotes of interest — Brenner (1990) and discussion.

Sydney Brenner is a well-known figure in genetics, having made major contributions to our understanding of gene function and establishing Caenorhabditis elegans as the enormously popular model organism that it is today. He shared the 2002 Nobel Prize for “discoveries concerning genetic regulation of organ development and programmed cell death'”.  He was also outspoken about various topics relating to genes and genomes, and had various things to say on the topic of junk DNA.  For example, he was fond of distinguishing between “junk DNA” and “garbage DNA” — the former of which accumulates while the latter is thrown away.

In June of 1989, Brenner participated in a conference held in Bern, Switzerland entitled “Human Genetic Information: Science, Law and Ethics”.  His contribution, “The human genome: the nature of the enterprise” was subsequently published along with transcripts of the discussion that followed in Brenner (1990). Here are some relevant excerpts, which highlight some of the discussions that were going on around the time of the early development of the Human Genome Project with regard to junk DNA and whether to sequence it (Brenner thought it should be ignored in favour of focusing on the genes). His discussion includes classic arguments about the limits of how much of the genome can have an essential, sequence-specific function, and thoughts on the possible functions of non-genic DNA.

“Grasping the magnitude of the technical tasks involved [in genome sequencing and analysis] requires some understanding of the sizes of genomes. First, the range of variation is about a millionfold, from around 4 kb (kilobases, 103) in small viruses to 4 Gb (gigabase, 109) in the human genome, and with bacterial genomes at the geometrical average of about 4 Mb (megabases, 106). The nematode has about 100 megabases and the Drosophila genome is twice that size. Evidence suggests that, up to the level of bacteria, genetic information is densely packed on the genome. This is because these unicellular organisms do not have a separate germline and the time it takes to replicate the genome could become rate limiting for cell multiplication. There is therefore a selective advantage for streamlined genomes which have eliminated all useless DNA. Streamlining is also found in some unicellular lower eukaryotes, such as yeast, which have genome sizes a few times larger than those of bacteria. We can make a reasonable estimate of the number of genes in these organisms by assuming that all the genes code for polypeptide sequences, and that the average size of a gene is, very roughly, one kilobase. Thus viruses would have of the order of ten to a few hundred genes, depending on their genome sizes, which bacteria might have as many as a few thousand genes. These estimates are well supported by what we know about these organisms, either from complete DNA sequences in the cases of some viruses, or from genetic and biochemical studies of bacteria such as Escherichia coli. If we applied this calculation to the genomes of higher organisms we would conclude that Drosophila has more than one hundred thousand genes and man, four million. Genetic considerations suggest that these estimates are far too large. For example, if all the supposed four million human genes performed indespensible functions, then, at an average forward mutation rate of 10-5 per gene, some 40 lethal mutations would have been accumulated in each preceding generation and therefore by now we should all be dead.

This argument about an upper limit on the sustainable number of truly essential regions of the genome with sequence-specific functions was made by others, including Susumu Ohno, in the 1970s. It’s one of several classic arguments that modern opponents of the concept of junk DNA seem to be unaware of.

A consideration of the mutational load suggests that one hundred thousand genes is a more likely upper estimate for the human genome and ten thousand for that of Drosophila. We also know that the genomes of higher organisms contain large amounts of DNA that carry no coding information, and indeed may carry no sequence information at all. Thus we have the surprising result that most of the human genome is junk; junk, and not garbage, because there is a difference that everybody knows: junk is kept, while garbage is thrown away. In higher organisms, with their separate germlines, there is little selective pressure for reducing the amount of DNA, and this is especially true for those with long developmental cycles. Processes leading to an increase of DNA, such as transpositions and certain errors of replication and recombination, are not deleterious and will go unchecked, while those that lead to a reduction in the amount of DNA will not be especially selected for. Such organisms accumulate and retain junk, while in the streamlined unicellular microbes, the junk has become garbage and has been eliminated.”

Here, Brenner is arguing that much of the genome is junk that has accumulated because it is not deleterious. Given that there are many well-known mechanisms that can add DNA to a genome (transposable elements, gene duplication and pseudogenization, replication slippage, illegitimate recombination, etc.), this is a sensible default position to take. Note, however, that Brenner does not dismiss the possibility that some of this DNA may become essential (see below) — neither did Ohno, or the authors of the selfish DNA papers in 1980. Note, also, that there was opposition to this notion each time it was suggested in the 1970s, 1980s, and 1990s.

“It is these considerations that lead me to question the conclusion that there can be no case now to aim to get the entire sequence of the human genome. If something like 98% of the genome is junk, then the best strategy would be to find the important 2%, and sequence it first. Some have argued that since we do not know what there is in all of the junk DNA, we cannot dismiss it, and the only way to find out is to sequence all of the genome. The counter to this is to look at on the sequencing of junk like income tax; it cannot be evaded but there are ways of avoiding it, and, anyway, it is one of the problems that we can and should leave for our successors. If cheap and easy methods of DNA sequencing were available, it would obviously pay to sequence everything and pick out the significant parts by computer. Ultimately it may be done this way, but the technology does not yet exist.”

Brenner argues that it would be better to focus on sequencing libraries constructed using cDNA, so that there could be a focus on elements in the genome that are very likely to have functions. This is largely based on technological considerations based on what was considered feasible in 1990 and not a dismissal of possible functional elements apart from protein-coding exons.

As with Ohno (1973), the discussion that took place after the presentation by Brenner are particularly interesting as an indicator of the state of thinking about junk DNA at the time:

Kimura: “I was very impressed with the statement that 98% of the human genome is junk, rather than garbage. Our daily experience suggests that sometimes ‘junk’ is valuable. Is it possible that some of the so-called junk genes might be found to be valuable…? If the junk gene could become valuable genes, we would have to change our criteria for what is junk.”

Brenner: “Unfortunately the organism cannot plan that. But, in one sense, organisms are very much like us! You get a wooden box and decide to keep it to make a bookcase, but you never do because it’s much cheaper to buy a bookcase, and so the wooden box remains as junk. Organisms cannot plan; there is nothing in the genome that can say that a piece of junk might come in handy in some future era, so let’s hold on to it!”

Some weird distinctions between “organisms” and “us” aside, this highlights a nonsensical but all-too-common idea regarding junk DNA: that it is there because it may someday become useful. This has been addressed many times over the decades, but it persists. Even Francis Collins, former head of the Human Genome Project and now director of the NIH, has invoked it.

“Davis: Is it possible that some, or much, of the junk does not have quite as definable function as, say, making an enzyme, but has regulatory roles that will turn out to be more than junk?”

Brenner: “I would be a fool if I denied that; it is possible, but that is another question I am going to leave for our successors. I am certainly not going to try to prove or disprove it for every piece of junk, and I shall avoid it.”

But perhaps the most telling quote about whether non-genic DNA had been dismissed as useless junk in the late 1980s is the concluding remark made by Gustav Nossal:

Nossal: “You are of course not alluding to certain shortish stretches of DNA that are becoming of very great interest, and clearly do have regulatory promoting or silencing functions, and are now a major object of study. What percentage, on top of the 2%, they represent, we don’t know yet, but it is not a negligible proportion.”


Part of the Quotes of interest series.

Genome Sequence Paper Title Generator

So, you and several dozen totally essential collaborators have been hard at work sequencing the genome of a super important species, and you’re ready to write up your results. Between doing the same analyses as every other genome sequencing study and overselling the novelty and significance of your results, you probably don’t have time to waste on something as trivial as the title of your paper.  Not having any particular hypothesis or well-reasoned research question in mind can also make it difficult to know how to package your paper for publication.

Luckily, the new “Genome Sequence Paper Title Generator” is here to help!  Simply fill in the details below, and you’re good to go!

Genome Sequence Paper Title Generator v.1.0

1) Type of genome sequence (optional): i) Draft, ii) Whole/Complete, iii) Comparative, iv) Do not specify

2) Species name: ____________________

3) Major contribution: i) Reveals, ii) Provides insights into

4) Novelty modifier (optional): i) Novel, ii) New, iii) Unique, iv) None

5) Ad-hoc focus of research: i) Notable physical, physiological, or behavioural trait of species, ii) Adaptation to environment of species, iii) Medical or industrial significance of species, iv) Features of genome of species