Dear Genome: Say what?

Here’s the first sentence from a paper published recently in Genome by Vibhu Ranjan Prasad and Karin Isler:

Gene content, the number of genes coding for proteins, is correlated with genome size in both noneukaryotes and eukaryotes (Lynch and Conery 2003; Konstantinidis and Tiedje 2004; Gregory 2002, 2005).

Say what?

The whole C-value enigma is based on the well-known discrepancy between genome size and gene number, which should be very clear from my papers (including the two that they cite).

From the first page of Gregory (2002):

Genome size bears no relationship to organismal complexity. If C-values are constant because DNA is the stuff of genes, then how could they be unrelated to gene number?

From the first page of Gregory (2005):

The discordance between genome size and organism complexity or gene number did not remain a paradox — that is, a pair of mutually exclusive truths — for very long. The discovery of non-coding DNA in the early 1970s explained the failure of DNA content to reflect the number of genes, and in so doing resolved the paradox. However, as with most significant advances in genetic knowledge, this finding raised more questions than it answered.

Not to mention this figure from Gregory (2005):

As is often the case, these authors focused only on sequenced genomes. For prokaryotes this isn’t such a problem because there is a much narrower range in genome size. The main bias in prokaryote genome sequencing has more to do with which species can be cultured in the lab. In eukaryotes, only species with small to medium sized genomes have been sequenced due to the difficulties and expense of working with large genomes.

The authors make the valid point that we should incorporate phylogenetic information where possible in doing comparisons across species, such as when assessing potential relationships between genome size and gene number. However, they are missing the much more important bias in the data, which is that it only includes eukaryotes of smallish genome sizes.

And for crying out loud, how did such a blatant miscitation make it into the journal?

Prasad, V.J. and K. Isler. (2012). Assessment of phylogenetic structure in genome size – gene content correlations. Genome 55: 391-395.

Gregory, T.R. (2005). Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics 6: 699-708.

Gregory, T.R. (2002). A bird’s-eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class Aves. Evolution 56: 121-130.

2 comments to Dear Genome: Say what?

  • I gotta love the use of two whole plant species in the tree.  What we know about plant genomes has changed a bit from the days of the long genome drought.  The correlation between genome size and TEs in plants has an r^2 of 98% (see fig.1 in Tenaillon et al. 2010).  

    Tenaillon MI, Hollister JD, Gaut BS (2010) A triptych of the evolution of plant transposable elements. Trends Plant Sci 15: 471-478.


  • Not only did they not read your paper (carefully), they seem to be using a strange definition of gene by implying that all genes encode proteins. As you know, there are some people who think the correlation between gene size and genome size is pretty good but that’s because they count thousands of (imaginary) genes that produce small regulatory RNAs.


Leave a Reply to Larry Moran




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>