Genome size, code bloat, and proof-by-analogy.

Posted on December 1, 2007 by T. Ryan Gregory

I recently did an interview with New Scientist for what, I am happy to say, was one of the most reasonable popular reviews of “junk DNA” that has appeared in recent times (Pearson 2007). My small section appeared in a box entitled “Survival of the fattest”, in which most of the discussion related to diversity in genome size and its causes and consequences. It even included mention of “the onion test“, which I proposed as a tonic for anyone who thinks they have discovered “the” functional explanation for the existence of vast amounts of non-coding DNA within eukaryotic genomes. Also thrown in, though not because I said anything about it, was a brief analogy to computer code: “Computer scientists who use a technique called genetic programming to ‘evolve’ software also find their pieces of code grow ever larger — a phenomenon called code bloat or ‘survival of the fattest'”.

I do not follow the literature of computer science, though I am aware that “genetic algorithms” (i.e., program evolution by mutation and selection) is a useful approach to solving complex puzzles. When I read the line about code bloat, my impression was that it probably gave other readers an interesting, though obviously tangential, analogy by which to understand the fact that streamlined efficiency of any coding system, genetic or computational, is not a given when it is the product of a messy process like evolution.

More recently, I have been made aware of an electronic article published in the (non-peer-reviewed) online repository known as arXiv (pr. “archive”; the “X” is really “chi”) that takes this analogy to an entirely different level. Indeed, the authors of the paper (Feverati and Musso 2007) claim to use a computer model to provide insights into how some eukaryotic genomes become so bloated. That is, instead of applying biological observations (i.e., naturally evolving genomes can become large) to a computational phenomenon (i.e., programs evolved in silico can become large, too), the authors flipped the situation around and decided that a computer model could provide substantive information about how genomes evolve in nature.

I will state up front that I am rarely (read: never) convinced by proof-by-analogy studies. Yes, modeling can be helpful if it provides a simplified way to test the influence of individual parameters in complex systems, but only insofar as the conclusions are then compared against reality. When it comes to something like genome size evolution, which applies to millions of species (billions if you consider that every species that has ever lived, about 99% of which are extinct, had a genome) and billions of years, one should be very skeptical of a model that involves only a handful of simplified parameters. This is especially true if no effort is made to test the model in the one way that counts: by asking if it conforms to known facts about the real world.

The abstract of the Feverati and Musso (2007) article says the following:

The development of a large non-coding fraction in eukaryotic DNA and the phenomenon of the code-bloat in the field of evolutionary computations show a striking similarity. This seems to suggest that (in the presence of mechanisms of code growth) the evolution of a complex code can’t be attained without maintaining a large inactive fraction. To test this hypothesis we performed computer simulations of an evolutionary toy model for Turing machines, studying the relations among fitness and coding/non-coding ratio while varying mutation and code growth rates. The results suggest that, in our model, having a large reservoir of non-coding states constitutes a great (long term) evolutionary advantage.

I will not embarrass myself by trying to address the validity of the computer model itself — I am but a layman in this area, and I am happy to assume for the sake of argument that it is the single greatest evolutionary toy model for Turing machines ever developed. It does not follow, however, that the authors are correct in their assertion that they “have developed an abstract model mimicking biological evolution”.

As I understand it, the simulation is based on devising a pre-defined “goal” sequence, similarity to which forms the basis of selecting among randomly varying algorithms. As algorithms undergo evolution by selection, they tend to accumulate more non-coding elements, and the ones that reach the goal most effectively turn out to be those with an “optimal coding/non-coding ratio” which, in this case, was less than 2%. The implication, not surprisingly, is that genomes evolve to become larger because this improves long-term evolvability by providing fodder for the emergence of new genes.

Before discussing this conclusion, it is worth considering the assumptions that were built into the model. The authors note that:

For the sake of simplicity, we imposed various restrictions on our model that can be relinquished to make the model more realistic from a biological point of view. In particular we decided that:

non-coding states accumulate at a constant rate (determined by the state-increase rate p_i) without any deletion mechanism [this is actually two distinct claims rolled into one],

there is no selective disadvantage associated with the accumulation of both coding and non-coding states,

the only mutation mechanism is given by point mutation and it also occurs at a constant rate (determined by the mutation rate p_m),

there is a unique ecological niche (defined by the target tape),

population is constant,

reproduction is asexual.

As noted, I am fine with considering this a fantastic computer simulation — it just isn’t a simulation that has any resemblance to the biological systems that it purports to mimic. Consider the following:

Although some authors have suggested that non-coding DNA accumulates at a constant rate (e.g., Martin and Gordon 1995), this is clearly not generally true. All extant lineages can trace their ancestries back to a single common ancestor, and thus all living lineages (though not necessarily all taxonomic groups) have existed for exactly the same amount of time. And yet the amount of non-coding DNA varies dramatically among lineages, even among closely related ones. Ergo, the rate of accumulation of non-coding DNA differs among lineages. Premise 1 is rejected.
The insertion of non-coding elements can be selectively relevant not only in terms of effects on protein-coding genes (many transposable elements are, after all, disease-causing mutagens), but also in terms of bulk effects on cell division, cell size, and associated organism-level traits (Gregory 2005). Premise 2 is rejected.
The accumulation of non-coding DNA in eukaryotes does not occur by point mutation, except in the sense that genes that are duplicated may become pseudogenized by this mechanism. Indeed, the model seems only to involve a switch between coding and non-coding elements without the addition of new “nucleotides”, which makes it even more distant from true genomes. Moreover, the primary mechanisms of DNA insertion, including gene duplication and inactivation, transposable element insertion, and replication and recombination errors, do not occur at a constant rate. In fact, the presence of some non-coding DNA can have a feedback effect in which the likelihood of additional change is increased, be it by insertions (e.g., into non-coding regions, such that mutational consequences are minimized) or deletions (e.g., illegitimate recombination among LTR elements) or both (e.g., unequal crossing over or replication slippage enhanced by the presence of repetitive sequences). Premise 3 is rejected.
Evolution does not have a pre-defined goal. Evolutionary change occurs along trajectories that are channeled by constraints and history, but not by foresight. As long as a given combination of features allows an organism to fill some niche better than alternatives, it will persist. Not only this, but models like the one being discussed are inherently limited in that they include only one evolutionary process: adaptation. Evolution in the biological world also occurs by non-adaptive processes, and this is perhaps particularly true of the evolution of non-coding DNA. It is on these points that the analogy between evolutionary computation and biological evolution fundamentally breaks down. Premise 4 is rejected in the strongest possible terms.
Real populations of organisms are not constant in size, though one could argue that in some cases they are held close to the carrying capacity of an available niche. However, this assumes the existence of only one conceivable niche. Real populations can evolve to exploit different niches. Premise 5 is rejected.
With a few exceptions (e.g., DNA transposons), transposable elements are sexually transmitted parasites of the genome, and these elements make up the single largest portion of eukaryotic genomes (roughly half of the human genome, for example). Ignoring this fact makes the model inapplicable to the very question it seeks to address. Premise 6 is rejected.

The main problem with proofs-by-analogy such as this is that they disregard most of the characteristics that make biological questions complex in the first place. Non-coding DNA evolves not as part of a simple, goal-directed, constant-rate process, but one typified by the influence of non-adaptive processes (e.g., gene duplication and pseudogenization), selection at multiple levels (e.g, both intragenomic and organismal), and open-ended trajectories. An “evolutionary” simulation this may be, but a model of biological evolution it is not.

Finally, it is essential to note that “non-coding elements make future evolution possible” explanations, though invoked by an alarming number of genome biologists, contradict basic evolutionary principles. Natural selection cannot favour a feature, especially a potentially costly one such as the presence of large amounts of non-coding DNA, because it may be useful down the line. Selection occurs in the here and now, and is based on reproductive success relative to competing alternatives. Long-term consequences are not part of the equation except in artificial situations where there is a pre-determined finish line to which variants are made to race.

That said, there can be long-term consequences in which inter-lineage sorting plays a role. In terms of processes such as alternative splicing and exon shuffling, which rely on the existence of non-coding introns, an effect on evolvability is plausible and may help to explain why lineages of eukaryotes with introns are so common (Doolittle 1987; Patthy 1999; Carroll 2002). However, this is not necessarily linked to total non-coding DNA amount. For a process of inter-lineage sorting to affect genome size more generally, large amounts of non-coding DNA would have to be insufficiently detrimental in the short term to be removed by organism-level selection, and would have to improve lineage survival and/or enhance speciation rates, such that over time one would observe a world dominated by lineages with huge genomes. In principle, this would be compatible with the conclusions of the model under discussion, at least in broad outline. In practice, however, this is undone by evidence that lineages with exorbitant genomes are restricted to narrower habitats (e.g., Knight et al. 2005), are less speciose (e.g., Olmo 2006), and may be more prone to extinction (e.g., Vinogradov 2003) than those with smaller genomes.

Non-coding DNA does not accumulate “so that” it will result in longer-term evolutionary advantage. And even if this explanation made sense from an evolutionary standpoint, it is not the effect that is observed in any case. No computer simulation changes this.

__________

References

Carroll, R.L. 2002. Evolution of the capacity to evolve. Journal of Evolutionary Biology 15: 911-921.

Doolittle, W.F. 1987. What introns have to tell us: hierarchy in genome evolution. Cold Spring Harbor Symposia on Quantitative Biology 52: 907-913.

Feverati, G. and F. Musso. 2007. An evolutionary model with Turing machines. arXiv.0711.3580v1.

Gregory, T.R. 2005. Genome size evolution in animals. In: The Evolution of the Genome (edited by T.R. Gregory). Elsevier, San Diego, pp. 3-87.

Knight, C.A., N.A. Molinari, and D.A. Petrov. 2005. The large genome constraint hypothesis: evolution, ecology and phenotype. Annals of Botany 95: 177-190.

Martin, C.C. and R. Gordon. 1995. Differentiation trees, a junk DNA molecular clock, and the evolution of neoteny in salamanders. Journal of Evolutionary Biology 8: 339-354.

Olmo, E. 2006. Genome size and evolutionary diversification in vertebrates. Italian Journal of Zoology 73: 167-171.

Patthy, L. 1999. Genome evolution and the evolution of exon shuffling — a review. Gene 238: 103-114.

Pearson, A, 2007. Junking the genome. New Scienist 14 July: 42-45.

Vinogradov, A.E. 2003. Selfish DNA is maladaptive: evidence from the plant Red List. Trends in Genetics 19: 609-614.

___________

Update: The author’s responses are posted and addressed here.

Evolution as fact, theory, and path.

Posted on November 28, 2007 by T. Ryan Gregory

As noted in my previous post, the new journal Evolution: Education and Outreach is now available online and free to download. My contribution to the first issue is “Evolution as fact, theory, and path“. Feel free to distribute this and any other papers from the journal as widely as you like, but please link to the journal website rather than re-posting papers.

There are now several available articles that discuss this important subject:

Evolution as fact, theory, and path (T. Ryan Gregory)
Evolution as fact and theory (Stephen Jay Gould)
Evolution: fact and theory (Richard E. Lenski)
Evolution is a fact and a theory (Laurence A. Moran)
Hypotheses, facts, and the nature of science (Douglas J. Futuyma)
The fact of evolution: implications for science education (James R. Hoffman and Bruce H. Weber)
What is fact without theory? (Brian J. Switek)

Evolution: Education and Outreach

Posted on November 28, 2007 by T. Ryan Gregory

I am very pleased to announce that the new journal Evolution: Education and Outreach will launch officially today at the National Association of Biology Teachers conference in Atlanta, Georgia. The online version is now operational as well.

You can read everything in Volume 1, Issue 1 here:

http://www.springerlink.com/content/phj263762420/

I’d say this first issue turned out quite well, especially as a first attempt that sets up the types of articles we will explore more down the line. We’re working on some exciting ideas for future issues. So stay tuned.

Download. Read. Enjoy. Share.

Bacterial genomes and evolution.

Posted on November 24, 2007 by T. Ryan Gregory

The seminar that I give most often when I am invited to speak at other universities begins with a brief introduction to genomes, sets up some comparisons between bacteria and eukaryotes, and then moves into a short overview of bacterial genome size evolution before spending the remainder of the time on genome size diversity and its importance among animals.

The main things that I have to say about bacterial genomes are:

1) Unlike in eukaryotes, bacterial genome size shows a strong positive relationship with gene number (in other words, bacterial genomes contain little non-coding DNA).

Genome size and gene number in bacteria and archaea.
From Gregory and DeSalle (2005).

2) Bacterial genome sizes do not vary anywhere near as much as those of animals do (on the order of 20-fold versus 7,000-fold).

The diversity of archaeal, bacterial, and eukaryotic genome
sizes as currently known from more than 10,000 species.
From Gregory (2005).

3) The major pattern in bacteria is that, on average, free-living species have larger genomes than parasitic species which in turn have larger genomes than obligate endosymbionts (Mira et al. 2001; Gregory and DeSalle 2005; Ochman and Davalos 2006).

Genome sizes among bacteria with differing lifestyles.

Because genome size is primarily determined by the
number of genes in bacteria, the question to be addressed
is why symbionts have fewer genes in their genomes.
From Gregory and DeSalle (2005).

In order to explain these patterns, it was sometimes argued that some bacteria have small genomes because there is selection for rapid cell division, with larger DNA contents taking longer to replicate and thereby slowing down the cell cycle. However, when Mira et al. (2001) compared doubling time and genome size in bacteria that could be cultured in the lab, they found no significant relationship between them. In other words, selection for small genome size is probably not responsible for the highly compact genomes of some bacteria, even though it seems plausible that, more generally, selection does prevent the accumulation of non-coding DNA to eukaryote levels in bacterial cells.

Mira et al. (2001) suggested a different interpretation that is based on two other major processes in evolution — mutation and genetic drift. In terms of mutation, they pointed out that on the level of individual changes that add or subtract relatively small quantities of DNA — i.e., insertions or deletions, or “indels” — deletions tend to be somewhat larger than insertions. The insertions in this case are separate from the addition of whole genes, which happens often in bacteria through sharing of genes among individuals or even across species (“horizontal gene transfer” or “lateral gene transfer“) or gene duplication.

In bacteria (and eukaryotes) small-scale deletions tend
to involve more base pairs than insertions, creating a
“deletion bias”. Of course, larger insertions such as of
transposable elements or gene duplicates are not part
of this calculation as they add much more DNA at once.
From Mira et al. (2001).

So, on the one hand, there are processes that can add genes (duplication and lateral gene transfer), whereas in the absence of these processes, and if there are no adverse consequences to losing DNA (i.e., there is no selective constraint occurring), genomes should tend to get smaller as a result of this deletion bias. In free-living bacteria, there are many opportunities for gene exchange, with lateral gene transfer adding DNA at an appreciable frequency. Moreover, free-living bacteria tend to occur in astronomical numbers, and elementary population genetics reveals that selection will be strong under such conditions (so that even a mildly deleterious mutation, such as a deletion or disruptive insertion, will probably be lost from the population over time). Finally, free-living bacteria must produce their own protein products, and therefore tend to make use of all their genes, which places selective constraints on changes (including indels) in those sequences.

Endosymbiotic bacteria, especially those that live within the cells of eukaryote hosts, are different in multiple relevant respects. First, they do not regularly encounter other bacteria from whom they can receive genes. Second, they occur in drastically smaller numbers — indeed, they experience a population bottleneck severe enough to shift the balance from selection to drift. Third, they come to rely on some metabolites provided by the host and no longer make use of all their own genes. These factors in combination mean that the selective constraints on many endosymbiont genes are relaxed, and the dominant processes become deletion bias and random drift. Over many generations, endosymbiotic bacteria lose the genes they are not using (and some that are only mildly constrained by selection, such is the strength of drift under such conditions) due to deletion bias, and the end result is highly compact genomes.

The compaction of genomes in endosymbionts can be extreme. The smallest genome known in any cellular organism (except, perhaps, one in Craig Venter‘s lab) is found in the bacterial genus Carsonella, a symbiont that lives within the cells of psyllid insects. It contains only 159,662 base pairs of DNA and 182 genes, some of which overlap (Nakabachi et al. 2006).

Carsonella (dark blue) living within the cells and
around the nucleus (light blue) of a psyllid insect.
From Nakabachi et al. (2006).

In some other bacteria, genes that are not used (including non-functional duplicates) may not be lost for some time and may persist as pseudogenes, just as are observed in large numbers in eukaryote genomes. These tend to undergo additional mutations and to degrade over time but can still be recognized as copies of existing genes. In Mycobacterium leprae, the pathogen that causes leprosy, for example, there are more than 1,100 pseudogenes alongside roughly 1,600 functional genes (Cole et al. 2001). Its genome is about 1 million base pairs smaller than that of its relative M. tuberculosis, but clearly many of the inactive genes have not (yet) been deleted.

The two major influences on bacterial genomes: insertion of
genes by duplication and lateral gene transfer, and the loss
of non-functional sequences by deletion.
From Mira et al. (2001).

It would be nice if this post could end there, having delivered a brief overview of an interesting issue in comparative genomics. Sadly, there is more to say because some anti-evolutionists apparently have begun using the topic in a confused attempt to challenge evolutionary science. In particular, though I note that I have become aware of this only second hand, some creationists apparently have suggested that all bacterial genomes are degrading and therefore that bacteria today are simpler than they were in the past, such that complex structures like flagella could not have evolved from less complicated antecedents.

It should be obvious that not all genomes are necessarily “degrading” just because there is a net deletion bias. For starters, selective constraints prevent essential genes from being lost by this mechanism in most bacteria. Furthermore, there exist well established mechanisms that can add new genes to bacterial genomes, including lateral gene transfer and gene duplication. In fact, the rate of gene duplication seems to be related to genome size in bacteria (Gevers et al. 2004). Also, as Nancy Moran noted in an email, “The most primitive bacteria were certainly simple, but they are not around or at least are not easily identified. Many modern bacteria have large genomes and are very complex.” Finally, the compact genomes of endosymbionts, such as in the aphid symbiont Buchnera aphidicola, tend to be more stable than the genomes of free-living bacteria in terms of larger-scale perturbations such as chromosomal rearrangements (Silva et al. 2003).

Some bacteria, in particular those that have shifted to a
parasitic or endosymbiotic dependence on a eukaryote host,
have undergone genome reductions (green, red) as compared
to inferred ancestral conditions. Nevertheless, many other
species continue to display large genomes (blue).
However, the very earliest bacteria probably began
with small genomes and simple cellular features.
From Ochman (2006).

As with eukaryotes, the genomes of bacteria provide exceptional confirmation of the fact of common descent. Not only do comparative gene sequence analyses shed light on the relatedness of different bacterial lineages and the evolution of features like flagella, but the presence — and loss to varying degrees — of non-functional DNA highlights a strong historical signal.

Given that it is her work that is being misused by anti-evolutionists, it is fitting that Dr. Moran be given the last word:

“It seems to me that the widespread occurrence of degrading genes, which are present in most genomes including those of animals, plants, and bacteria, argues pretty strongly in favor of evolution. They are the molecular equivalent of vestigial organs.”

Quite right.

_____________

References

Cole, S.T., K. Eiglmeier, J. Parkhill, K.D. James, N.R. Thomson, P.R. Wheeler, and et al. 2001. Massive gene decay in the leprosy bacillus. Nature 409: 1007-1011.

Gevers, D., K. Vandepoele, C. Simillion, and Y. Van de Peer. 2004. Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends in Microbiology 12: 148-154.

Gregory, T.R. 2005. Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics 6: 699-708.

Gregory, T.R. and R. DeSalle. 2005. Comparative genomics in prokaryotes. In The Evolution of the Genome, ed. T.R. Gregory. Elsevier, San Diego, pp. 585-675.

Mira, A., H. Ochman, and N.A. Moran. 2001. Deletional bias and the evolution of bacterial genomes. Trends in Genetics 17: 589-596.

Nakabachi, A., A. Yamashita, H. Toh, H. Ishikawa, H.E. Dunbar, N.A. Moran, and M. Hattori. 2006. The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science 314: 267.

Ochman, H. 2006. Genomes on the shrink. Proceedings of the National Academy of Sciences of the USA 102: 11959-11960.

Ochman, H. and L.M. Davalos. 2006. The nature and dynamics of bacterial genomes. Science 311: 1730-1733.

Silva, F.J., A. Latorre, and A. Moya. 2003. Why are the genomes of endosymbiotic bacteria so stable? Trends in Genetics 19: 176-180.

Please vote for my lab website.

Posted on November 20, 2007 by T. Ryan Gregory

The online edition of The Scientist is presently running their Laboratory Web Site and Video Awards. Sixty lab websites were nominated, and the judges have chosen 10 finalists. The winner will be chosen by vote, which is open to everyone.

My lab website is among the 10 finalists, so I would very much appreciate it if you throw a vote our way. 🙂

You don’t have to choose just one favourite as it is a rank vote.

Help requested: Who said non-coding DNA was all non-functional?

Posted on November 18, 2007 by T. Ryan Gregory

I have a request that I hope some readers can help me with. I am looking for examples from the literature (rather than any “general sense”) of people who claimed that “junk DNA” or “selfish DNA” was totally non-functional. I am particularly interested in peer-reviewed primary articles, but media reports and textbooks are of interest too. Anything from the 1970s to the present would be useful, especially pre-2000 publications. I suspect that the assumption that junk = totally functionless arose sometime in the 1990s, and hardly qualifies as a long-held view. I would also be interested to see references in which people suggest that the term “junk” was meant only to reflect our ignorance about what non-coding DNA is doing in the genome. Post your examples (with a quote and full reference info) in the comments. (Please don’t list Ohno 1972).

Quotes of interest — junk DNA and selfish DNA.

Posted on November 18, 2007 by T. Ryan Gregory

There has been a lot of discussion regarding discoveries in genomics, in terms of both genes (especially their number) and non-coding DNA (in particular whether any of it is functional and how much of it is transcribed). All of this supposedly contradicts long-held assumptions about genomes, especially those attributed to the early proponents of “junk DNA” or “selfish DNA” such as that all non-coding elements must be totally non-functional.

I thought I would share some quotes about this topic that I found interesting.

The observation that up to 25% of the genome of fetal mice is transcribed into rapidly labeled RNA, despite the fact that probably less than half this much of the genome serves a useful function, indicates that much of the junk DNA must be transcribed. It is thus not too surprising that much of this is rapidly broken down within the nucleus. There are several possible reasons why it is transcribed: (1) it may serve some unknown, obscure purpose; (2) it may play a role in gene regulation; or (3) the promoters which allow its transcription may remain sufficiently intact to allow RNA transcription long after the structural genes have become degenerate. [1]

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA. [1]

The observations on a number of structural gene loci of man, mice and other organisms revealed that each locus has a 10^-5 per generation probability of sustaining a deleterious mutation. It then follows that the moment we acquire 10⁵ gene loci, the overall deleterious mutation rate per generation becomes 1.0 which appears to represent an unbearably heavy genetic load. Taking into consideration the fact that deleterious mutations can be dominant or recessive, the total number of gene loci of man has been estimated to be about 3×10⁴. [2]

The creation of every new gene must have been accompanied by many other redundant copies joining the ranks of silent DNA base sequences, and these silent DNA base sequences may now be serving the useful but negative function of spacing those which have succeeded. [2]

It would be surprising if the host genome did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another. [3]

It seems that what the new publications do, rather than overturning any previous claims, is indicate that some authors don’t read the literature that they cite.

___________

Part of the Quotes of interest series.
___________

References

1. Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

2. Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

3. Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.
3: 237-431.

Endogenous retroviruses and human transcriptional networks.

Posted on November 15, 2007 by T. Ryan Gregory

The human genome, like that of most eukaryotes, is dominated by non-coding DNA sequences. In humans, protein-coding exons constitute only about 1.5% of the total DNA sequence. The rest is made up of non-coding elements of various types, including pseudogenes (both classical and processed), introns, simple sequence repeats (microsatellites), and especially transposable elements — sequences capable of autonomous or semi-autonomous movement around, and in most cases duplication within, the genome. Endogenous retroviruses (ERVs), which are very similar to or indeed are classified as long terminal repeat (LTR) retrotransposons, represent one type of transposable element within Class I (elements that use an RNA intermediate during transposition; Class II elements transpose directly from DNA to DNA by cut-and-paste mechanisms). Roughly 8% of the human genome is represented by ERVs, which are descendants of former exogenous retroviruses that became incorporated into the germline genome.

It seems that no discussion about non-coding DNA is complete without stating that until recently it was all dismissed as useless junk. This claim is demonstrably false, but that does not render it uncommon. Some scientists did indeed characterize non-coding DNA as mostly useless, but finding references to this effect that do not also make explicit allowances for potential functions in some non-coding regions is challenging. Even authors such as Ohno and Comings, who first used the term “junk DNA”, noted that this did not imply a total lack of function. In fact, for much of the early period following the discovery of non-coding DNA, there was plentiful speculation about what this non-coding DNA must be doing — and it must be doing something, many authors argued, or else it would have been eliminated by natural selection. (Hence the fallacy involved in claiming that “Darwinism” prevented people from considering functions for non-coding regions within the genome).

Some authors rejected this automatic assumption of function, and argued instead that mechanisms of non-coding DNA accumulation — such as the accretion of pseudogenes following duplication (“junk DNA” sensu stricto) or insertions of transposable elements (“selfish DNA”) — could account for the presence of so much non-coding material without appeals to organism-level functions. However, the originators of such ideas often were careful to note that this did not preclude some portions of non-coding DNA from taking on functions, especially in gene regulation [Function, non-function, some function: a brief history of junk DNA].

There are lots of examples of particular transposable elements, which probably began as parasitic sequences, becoming co-opted into integral roles within the host genome. This process has played an important role in several major transitions during the macroevolutionary history of lineages such as our own. There is a large and growing literature on this topic, but reviewing this is beyond the scope of this post (see chapter 11 in The Evolution of the Genome for some examples). The present post will focus on only one recent case that was published this month in the Proceedings of the National Academy of Sciences of the USA by Ting Wang, David Haussler, and colleagues which focuses on the role of ERVs in the evolution of a key human gene regulatory system.

Here is the abstract from their paper (which is open access and is available here):

Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53

The evolutionary forces that establish and hone target genenetworks of transcription factors are largely unknown. Transpositionof retroelements may play a role, but its global importance,beyond a few well described examples for isolated genes, isnot clear. We report that LTR class I endogenous retrovirus(ERV) retroelements impact considerably the transcriptionalnetwork of human tumor suppressor protein p53. A total of 1,509of ${approx}$ 319,000 human ERV LTR regions have a near-perfect p53 DNAbinding site. The LTR10 and MER61 families are particularlyenriched for copies with a p53 site. These ERV families areprimate-specific and transposed actively near the time whenthe New World and Old World monkey lineages split. Other mammalianspecies lack these p53 response elements. Analysis of publishedgenomewide ChIP data for p53 indicates that more than one-thirdof identified p53 binding sites are accounted for by ERV copieswith a p53 site. ChIP and expression studies for individualgenes indicate that human ERV p53 sites are likely part of thep53 transcriptional program and direct regulation of p53 targetgenes. These results demonstrate how retroelements can significantlyshape the regulatory network of a transcription factor in aspecies-specific manner.

The TP53 gene is a “master control gene” — a sequence whose product (“protein 53”, or “p53“) is a transcription factor that binds to DNA and regulates the expression of other genes, including ones involved in DNA repair, cell cycle regulation, and programmed cell death (apoptosis). It is so important that it has been dubbed “the guardian of the genome”. Mutations in this gene can be highly detrimental: the “T” in TP53 stands for tumor, and mutations in this gene are often associated with cancers. This includes many smoking-related cancers.

The authors of this study report that particular ERVs contain sites to which the p53 protein binds. As a result of past retrotransposition, these ERVs tend to be distributed in various locations in the genome. This makes it possible for the p53 protein to bind not just at one site, but at sites dispersed in different regions, and therefore in proximity to a variety of other genes. It is this distributed network of binding sites that allows p53 to regulate so many other genes in its role as genome guardian. And this is only possible because an ERV with a site to which the p53 protein is capable of binding inserted into the genome of an early primate ancestor some 40 million years ago, made copies of itself throughout the genome, and then became useful as a source of binding sites. This is classic co-option (exaptation) at the genomic level, and represents the very same kind of explanation that Darwin himself offered for the evolution of complex structures at the organismal scale.

While this is a truly interesting discovery that sheds even more light on the complex history of the genome, it also highlights some important points that I have tried to make on this blog. First, this applies to only a fraction of non-coding DNA. Only about 8% of the genome is made up of ERVs, and, of these, only 1,509 of 319,000 copies (0.5%) include the relevant binding site. About 90% of the ERVs are represented only by “solo LTRs”, the long repeats at the end that remain after the rest of the element was deleted. Moreover, several ERVs have been implicated in autoimmune diseases. Thus, not only is only a small fraction likely to be involved in gene regulatory networks such as that of TP53, others are clearly maladaptive from the perspective of the host genome.

The evolution of the genome is a complex process involving multiple types of elements and interactions at several levels of organization. While very few authors ever claimed that all non-coding DNA was totally without function, it is certainly the case that non-coding sequences are worthy of the new-found attention that they have received from the genomics community. Let us hope that this will include more integration with evolutionary biology than has been evident in the past, as it clearly requires an appreciation of both complexity and history.

_________

ps: The press release from UC Santa Cruz by Karen Schmidt is quite good (notwithstanding the mandatory “it was dismissed as junk” line).

Males answer the call of selection because they’re simpler… I see…

Posted on November 14, 2007 by T. Ryan Gregory

Ok, first let me get the extremely sloppy wording in this story on EurekAlert out of the way [Simple reason helps males evolve more quickly]:

“No matter the species, males apparently ramp up flashier features and more melodious warbles in an eternal competition to win the best mates, a concept known as sexual selection.”

“Researchers believe this relatively uncomplicated genetic pathway helps males respond to the pressures of sexual selection, ultimately enabling them to win females and produce greater numbers of offspring.”

“It turns out that the extra X in females may make answering the call of selection more complicated.”

The story suggests that traits like elaborate ornamentation evolve more readily in males than in females because males have “simpler” genetic systems, not having that second X chromosome and all. This is based on a study of Drosophila melanogaster, which the story notes has an XY (male) / XX (female) chromosomal sex determination system, similar in broad outline to the situation in mammals.

To my mind, we don’t need any additional genetic explanation. Where sexual selection leads to elaborate characteristics in one sex, it is usually males because they make the least investment (at least at the gamete level, and often in terms of parental care), have more variable reproductive success among individuals, and have their reproductive output determined in large part by the number of females with whom they mate. For females, this is most often not the case. So males compete for females and females are choosy in many species, leading to traits in males that are used in combat with other males or are favoured by females. This goes back to Darwin in 1871, with important contributions from the likes of R.A. Fisher in the ’30s.

Here’s the problem with the “it’s the extra chromosome, stupid” hypothesis. In some groups (e.g., seahorses, some birds) the males raise the young and are choosy and the females are the ornamented ones. As far as I know, they have the same sex determination system as related species that have the more typical sexual selection processes.

If that weren’t enough, just consider that birds — which provide some of the best known outcomes of sexual selection like peacocks’ tails — have a ZZ (male) / ZW (female) chromosomal sex determination system. That’s right — female birds have the so-called “simpler” genetic arrangement.

Orgel’s Second Rule and "unbeatable" predation tactics.

Posted on November 13, 2007 by T. Ryan Gregory

Leslie Orgel, who passed away a few weeks ago, was an accomplished thinker who explored some of the biggest questions in biology, including the origin of life itself. He was also a co-author, with Francis Crick, of one of the two key “selfish DNA” papers that critiqued the tendency among many authors to assume without evidence that all non-coding DNA is functional at the organismal level. But he is perhaps best remembered for Orgel’s Rules.

Orgel’s Second Rule, in particular, is well known among biologists. It states, quite succinctly, that “Evolution is cleverer than you are”. This is not to imply that evolution has conscious motives or methods, but that most people who say that this or that could not evolve are simply exhibiting a lack of imagination.

In a previous post, I discussed a rather silly statement in a science news story about venomous snakes and their toxic frog prey. I may have been unduly harsh in shooting the messenger, as it seems that the authors of the paper themselves made the statement in question in a different interview.

To borrow from my previous post, the story can be summarized as follows. The frogs are toxic or may be covered by a sticky glue-like substance depending on the species, but the snakes manage to consume them nonetheless by killing the frog and then waiting for the objectionable substances to dry out or degrade before eating the prey. The frog usually travels some distance before succumbing to the snake’s venom, at which time the predator tracks it down and devours it. Assessing the state of non-lethal doses of toxin in their mouths after the initial bite of the frogs allows the snakes to discriminate between species of toxic prey, such that they wait, say, 30 minutes post mortem before eating one species of frogs, but 40 minutes for a different species whose toxins persist slightly longer before breaking down.

While discussing their recent publication, the authors are quoted thus:

In evolutionary terms, the snake’s strategy of ‘bite, release, and wait’ is unbeatable by the frogs. Although prey often evolve ways of overcoming predator tactics, the frogs can’t do so in this case – because the snake’s strategy only becomes effective after the frog has died. Natural selection ceases to operate on an individual after that individual’s death, so frogs will probably never evolve toxins that last longer in response to the snake’s tactic. Thus, this waiting strategy is likely to be stable and unbeatable over evolutionary time. [Emphasis added].

So, the argument is being made that because the frogs that are attacked by snakes are dead, there can be no selection on frogs that would lead to counteracting features evolving. The obvious objection to this is that the direct individual fitness of a given frog may not be increased if the toxins last longer, but that its inclusive fitness (i.e., its own reproduction plus the reproductive success of genetic relatives) would be. The authors take pains to dismiss this possibility, and note that there is no parental care or geographical concentration of kin beyond the earliest life stages. Fair enough, although I don’t think one should dismiss the possibility of more subtly biased localizations of kin without supporting data.

But assuming for the sake of argument that indeed kin selection is absent in these frogs, is there any imaginable circumstance in which the snake’s strategy could be undermined? Or is it truly “unbeatable over evolutionary time”? I suggest that there are several possibilities — and I freely accept that evolution is cleverer than I am.

Scenario #1: The frogs become even more toxic than they already are, such that even the first bite by the snake is fatal or at least detrimental. This one almost goes without saying. It seems especially obvious since the toxin appears to be there in the first place only as a defensive adaptation.

Scenario #2: The frogs become more resistant to snake venom such that the snake has to handle them for longer and be exposed to more toxin. Again, this is a simple one. It could also operate in addition to the first case.

Scenario #3: There is an increase in the longevity of the toxin such that snakes have to wait longer before eating the frogs. The snakes apparently eat several species of frogs, some of which are non-toxic. They may not even envenomate those species, and can swallow them right away. The toxic frogs, by contrast, must be struck and then the snake must wait and perhaps track them down before it can consume them. There will therefore be a threshold in at which it is not worth waiting for the toxic frog to become palatable but instead to spend time looking for non-toxic prey. The point at which a snake is better off just avoiding the toxic frogs and instead looking for non-toxic frogs will be largely dependent on two parameters: 1) how often the toxic and non-toxic frogs are encountered relative to each other, and 2) how long a snake has to wait before eating a toxic frog and how costly it is in terms of envenomating, tracking, and taste-testing it. Change either of these and the snake population may evolve to simply avoid toxic frogs. Both could be affected by changes in the frog population, such as if individual frogs behave in such a way as to be less frequently encountered by snakes (this would be favoured for obvious reasons), or if the toxin of many individuals takes longer to break down (which increases the average cost per frog as experienced by the snakes).

Scenario #4: A mutation appears that makes the toxin longer lasting but no more costly to produce, and is associated with something (e.g., taste or some other indicator) that the snakes can detect before biting the frogs. In this case, the snakes could learn to avoid the longer lasting toxic frogs (or there could be selection on innate tendencies to avoid certain cues in frogs). The snakes would then focus on non-toxic frogs or toxic frogs that are edible more quickly following death. This would impose selection on the frog population for longer lasting toxin.

I welcome more suggestions, and I repeat Orgel’s caution against making universal claims about what evolution can or can’t do.

______________

More possibilities (see comments thread for credits)…

Scenario #5: Frogs exhibit phylopatry such that relatives are concentrated into reasonably localized distributions — not kin groups per se, but a non-random distribution such that kin are more likely to be close to one another than to non-relatives in general. This would not be kin selection in the sense that the authors dismiss, but it could result in inclusive fitness effects on selection in favour of long lasting toxin alleles. In other words, long lasting toxin genes could be favoured if it deters snakes from attacking frogs in a specific region and that region is occupied primarily by relatives of the long lasting dead frog who share the same toxin genotype.

Scenario #6: Frogs normally live within closed geographic areas, and each subpopulation varies in its overall frequency of long lasting toxin alleles. Groups with lots of these long-lasting toxic frogs would be preyed upon less by snakes (assuming the snakes can tell the difference and focus instead on hunting in areas where there are fewer long lasting toxic frogs). This would favour the long lasting toxin even if not all members of the group have it. If there is infrequent admixture of subpopulations, and the groups with more long lasting toxic frogs contribute more offspring to the metapopulation, then this could be favoured locally and spread throughout.

Scenario #7: The frogs may be preyed on by another species of snake that does not exhibit the bite-release-eat tactic and evolve longer lasting toxins in response to that other interaction, which indirectly affects the snakes that do display the behaviour.