The junk DNA quotes of interest series.

One of the things I am happiest about having back online with the restored blog is the Quotes of Interest series. I had gone through the contemporary primary literature describing the discovery of each new category of non-genic DNA and showed that in every single case, functions were considered for these sequences (if not outright assumed to exist). Here’s the list of posts from this series:

Science by press release, but still interesting…

No paper out yet, and not even any details made available, but this looks interesting:

Reduced genome works fine with 2000 chunks missing

To put a figure on how much of our DNA is non-essential, Vrijenhoek and his colleagues screened the genomes of 600 healthy students, searching for chunks of DNA at least 10,000 base pairs in length that were missing in some individuals. Across all the genomes, about 2000 such chunks were missing – amounting to about 0.12 per cent of the total genome.

Some people will over-interpret this as strong evidence for a majority of “junk DNA”. Comprising only 0.12% of the genome, it isn’t. However, as these are natural deletions >10kb, it gets around the objections to the deletion studies (i.e., that the conditions in the lab weren’t the same as the challenges faced in the wild). Then again, it may be that you can have one or two deletions and be ok because there is some redundancy, but if you were missing all of these bits you’d be in trouble. Others will dismiss it as an artifact or somehow not really testing the claim (read: dogmatic assumption) that all DNA is functional, but what else is new.

The Junk DNA myth strikes again (next up: media hype).

Here’s the abstract of a paper set to be published in Molecular Biology and Evolution. Now, I think this kind of study is interesting and important. But it’s predictable that they start out with the standard (and historically false) claim that “non-coding DNA was long dismissed as junk” (seriously, do reviewers require authors to insert this line or something?). It’s also predictable that the amount of non-coding DNA that they report as showing signs of constraints (about 5% of the genome) will be reported in science news as “junk DNA functional after all!”.

Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes.
Eory L, Halligan DL, Keightley PD

Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here we show that in hominids, a group with historically low effective population sizes, all classes of non-coding DNA evolve more slowly than ancestral transposable elements, and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5′ UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably four-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at four-fold sites are effectively selected against in hominids. The higher constraint at four-fold sites in hominids suggests a more complex protein-coding gene structure than murids, and indicates that methods for detecting selection on protein coding sequences (e.g., using the d(N) /d(S) ratio), with four-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection, and that there are three times as many constrained sites within non-coding sequences as within protein-coding sequences. Including coding and non-coding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.

Update: See BIOpinionated for a silly critique and Sandwalk for a fine reply.

From non-coding to coding genes.

I sometimes get asked if non-coding elements (usually “junk DNA” is what they say) can ever evolve into genes. I usually say that transposable elements, at least, can be coopted into functional roles, and that it wouldn’t be so odd if a pseudogene took on a novel function sometime through mutations. Kind of a lame answer, I know, but there haven’t been too many unambiguous examples yet, so cut me some slack.

Anyway, here’s a story in New Scientist that describes a report of three genes unique to humans that appear to have arisen from non-coding DNA. I don’t know about other researchers, but I certainly didn’t consider this “virtually impossible” (as New Scientist states), just rare.

Three human genes evolved from junk

I have to give New Scientist credit on this story for not going with the easy, lazy, and incorrect “everyone thought it was junk but now it’s all turning out to have a function!” template. As the author, Michael LePage, writes:

The researchers conclude that three of these non-coding sequences must have mutated in humans and become capable of coding for the short proteins at some point since we diverged from chimps six million years ago. While at least half the non-coding DNA in humans is junk with no function, it is not clear whether the non-coding DNA from which the genes evolved had any function.

Such “de novo” gene evolution was once thought impossible because random mutations are highly unlikely to produce a DNA sequence that encodes a protein of any length, let alone a protein that will be transcribed by cells and do anything useful. But in 2006, several de novo genes were discovered in fruit flies. Since then, it’s become clear that genes do continually evolve in this way.

Part of the explanation might be that biological systems are very noisy: even though most of our DNA is junk, most of it still gets transcribed into RNA at times, and some of that RNA probably reaches cells’ protein-making machinery. This means that when mutations do throw up sequences capable of encoding proteins, some may get “tested” and useful ones selected for. As more primate genome data becomes available, McLysaght estimates a further 15 human genes will turn out to have evolved de novo.

LePage, by the way, also wrote the excellent piece Evolution: 24 Myths and Misconceptions.

The abstract of the forthcoming paper by Knowles and McLysaght (2009):

The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.

Random quote about non-coding DNA.

I’m not making this officially part of the Quotes of Interest series, but I came across it while reading some papers yesterday and thought it worthy of note.

“Since the sequence composition of satellite DNA is remarkably heterogeneous in most organisms, and since its phenotypic or evolutionary function is not yet clear, satellite DNA is often called “selfish DNA” or “parasitic DNA” (Doolittle and Sapienza, 1980; Orgel and Crick, 1980). There is, however, accumulating evidence that such DNA sometimes contains functional DNA…”

This is a fairly standard introduction for a paper, and can be found in many recent articles announcing the end of a long period of neglect of “junk DNA” or “selfish DNA”.

Just one thing. This was published in 1991.


Imai, H.T. 1991. Mutability of constitutive heterochromatin (C-bands) during eukaryotic chromosomal evolution and their cytological meaning. Japanese Journal of Genetics 66: 635-661.

Shaking up the theory of evolution.

I was just sent a link to this press release. Is this a parody or something?

Shaking up the theory of evolution

In a year that celebrates the 200th anniversary of the birth of Darwin and the 150th anniversary of the publication of “On the Origin of the Species, Murdoch scientists have made an exciting discovery. Their hypothesis, which argues that DNA junk is essential for evolution, may represent one of the biggest advances in evolutionary theory, since the 1930s.

Murdoch University scientists have developed an improved theory of evolution – a groundbreaking hypothesis which finally reconciles evolutionary theory with the fossil record.

Developed by PhD student Keith Oliver and Program Chair of Biomedical Sciences Dr Wayne Greene, the “Genomic Drive” hypothesis, potentially represents one of the biggest advances in evolutionary theory since the 1930s.

DNA “junk”

In a co-authored report, due to be published in the prestigious BioEssays journal, the researchers argue that transposable elements (TEs) – or what is colloquially termed jumping genes, selfish or junk DNA, have a critical role in ensuring the survival of biological lineages.

Without this DNA junk, a species is effectively frozen and faces eventual extinction.

On the other hand, species with genomes with high TE activity or strong presence of identical TEs possess a greater ability to evolve, diversify and survive.

Take for example humans, rodents and bats.

As primates some 46 per cent of the human genome is comprised of TEs while other mammals such as rodents and bats are known to possess around 40 per cent.

These TE’s are generally suppressed in the ordinary body cells of most species but are allowed to reactivate in reproductive cells for the potential benefit of the next generation.

Their activity can also be triggered when they suddenly hop between species or by stress.

TEs do their survival work by reformatting and rearranging DNA genomes to sometimes create significant adaptive mutations that undergo natural selection.

Current theory doesn’t tally with fossil evidence

Dr Greene, a Senior Lecturer in Molecular Genetics, said current evolutionary theory, which assumed biological lineages evolved by the slow accumulation of adaptive mutations, did not tally with the fossil record.

However, the “Genomic Drive” theory provided a significant explanation for the way new species arose abruptly and periodically.

The theory also fitted with fossil records which showed intermittent and long periods of stasis – where many species stood still or remained the same.

Mr Oliver said the hypothesis argued that significant evolution could not take place without the activity of TEs.

“Although we are standing on the shoulders of others that have worked on TEs, we believe this is the strongest and most comprehensive case ever put forward on the role of TEs in evolution,” Mr Oliver said.

“If our theory proves correct it would be one of the biggest advances in evolution since the 1930s when Darwinism and Mendelism were reconciled in NeoDarwinism.”

Species without junk DNA risked extinction

Dr Greene said species that were devoid of TEs were more at risk of extinction because they simply lacked the capacity to adapt, change and diversify.

“If you don’t have this junk in your genome then you can’t evolve and are stuck, thereby remaining in what is termed evolutionary stasis,” Dr Greene said.

“This would explain why almost all species control their TEs rather than eliminate them.

“And of course having these TEs in a genome doesn’t mean a lineage will necessarily diversify. What it does mean is that it has a much greater potential to do so.”

Mr Oliver said an example of evolutionary stasis occurring in species without TE activity could be seen in the living fossil, the coelacanth, once thought to be extinct for 63 million years.

The coelacanth, which had been found off the coast of South Africa and Indonesia, had inactive or low levels of TEs and had been in stasis for 400 million years.

In another example he referred to the tuatara, where just two species had been found off the coast of New Zealand.

Like the coelacanth, the tuatara was characterised by very few jumping genes and has been unchanged for 220 million years.

An explanation for many unanswered questions

Dr Greene said Genomic Drive theory provided an explanation for many unanswered questions such as why species suddenly appeared in the fossil record, why some groups of organisms were species rich and others species poor and why some species changed little over millions of years.

Successive waves of TE activity in a lineage potentially explained alternations of rapid evolution and stasis.

He said some species – such as bats which “came out of nowhere” in the Eocene Period – suddenly appeared in the fossil record.

This was in keeping with evidence that TE or jumping gene activity occurred in sudden episodic bursts.

Improving the ability to diversify, adapt and survive

Dr Greene said an example of how TE activity affected the richness of a lineage was seen in rodents and bats.

These were species-rich orders of mammals and, unusually for modern mammals, both harboured highly active TEs.

Although there wasn’t enough data yet, the presence of TEs could also help to explain why one order of birds, commonly known as the Songbirds, (the Passeriformes) accounted for over half of all bird species and why the Perciformes accounted for 40 per cent of fish species.

While jumping gene activity in the 235 species of primates had quietened down a lot since its peak about 40 million years ago, the high presence of identical TEs in the primate genome pointed to an improved ability to diversify, adapt and survive.

By comparison a cousin of the primate, the Flying Lemur, lacked a key TE that primates had in abundance and only two species of it remained.

Quotes of interest – ERVs.

It has been quite some time since the last update to the Quotes of interest series on junk DNA. Most of the posts have sought to demonstrate that the exhausting cliché that scientists dismissed possible functions for non-coding DNA until recently is false. Therefore, I have provided many quotes indicating that many (if not most) biologists continued to consider possible functions for various non-coding elements throughout the mythical period of neglect. This time, I want to discuss an example in which a particular kind of non-coding sequence was considered as probably non-functional — but because of knowledge about its biology, not because no function could be imagined.

The elements under discussion are endogenous retroviruses (ERVs) which, as the name suggests, are viral-like sequences that exist within the genome. Depending on who you ask, they are either very similar to or are interchangeable with long terminal repeat (LTR) retrotransposons. ERVs make up approximately 8% of the human genome, while LTR elements account for 50-80% of the maize genome.

Endogenous retroviruses were discovered in the 1960s and 1970s (see Weiss 2006), but were first dubbed “endogenous viruses” by David Baltimore in 1974 (published in Baltimore 1975).

Here is how Baltimore (1975) explained their origin:

Evidence has accumulated that viruses have entered the germ line at various times during the ancestry of different species. For convenience, two different cases can be considered: acquisition of viral genomes during inbreeding or domestication and acquisition of viral genomes during the evolution of a species. In principle, viruses could have become part of an animal genome at any stage of evolution and still be detectable now.

Baltimore (1975) discussed the fact that these “endogenous viruses” generally do not grow well in the species in which they had been identified, and that they often show signs of degrading by mutation. Moreover, being clearly similar to viruses and sometimes causing diseases, it seemed very unlikely that they were maintained because they conferred some functional advantage to their hosts. As he concluded,

It is my guess that these viruses have no positive function to play in the life of the animals in which they are resident. Rather, there is an evolutionary equilibrium balancing their acquisition and loss. The viruses are being inserted into the germ line at very low frequency, after which they require many thousands or millions of years to be mutated away because they have little or no detrimental effect on the animal in which they are resident. Viruses that did have a detrimental effect would be lost rapidly and might never come to our attention.

It is worth noting that Baltimore (1975) does not cite Ohno (1972), makes no reference to “junk DNA”, and reaches a tentative conclusion about lack of function from his consideration of the origin and properties of the elements.

Today, some examples are known of ERVs with beneficial effects, such as in placental development (Mi et al. 2000) and p53 binding sites (Wang et al. 2007). (Note, however, that only 0.5% of identified ERVs are associated with binding sites). As Weiss (2006) summarized the present situation:

As Mendelian elements, retroviruses must be subject to host selection. However, with the exception of enrolling env genes in placental differentiation, ERV appear to be parasitic DNA sequences for which the host has little use, other than to protect against further retrovirus infection. Potentially, ERV can damage the host by mutational insertion and by homologous recombination. But despite a tendency to implicate ERV in many ‘non-infectious’ diseases in humans, there is scant evidence that they play a significant role. There are only rare examples where a recessive single gene disorder in a family lineage is caused by an endogenous retroviral insertion disrupting gene function.

Seems like Baltimore’s (1975) assessment was largely correct with regard to mammalian ERV sequences.


Baltimore, D. (1975). Tumor viruses: 1974. Cold Spring Harbor Symposia on Quantitative Biology 39: 1187-1200.

Mi, S. et al. (2000). Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature 403: 785-789.

Wang, T. et al. (2007). Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proceedings of the National Academy of Sciences USA 104: 18613-18618.

Weiss, R.A. (2006). The discovery of endogenous retroviruses. Retrovirology 3: 67.


Part of the Quotes of interest series.

Scitable again.

I noted previously that Scitable, a resource from Nature Education, was interesting but had some problems with the content [Scitable (and a weird piece on DNA barcoding)].

Well, more troubles.

Excerpt from Transposons, or Jumping Genes: Not Junk DNA?

Transposable elements (TEs), also known as “jumping genes” or transposons, are sequences of DNA that move (or jump) from one location in the genome to another. Maize geneticist Barbara McClintock discovered TEs in the 1940s, and for decades thereafter, most scientists dismissed transposons as useless or “junk” DNA. McClintock, however, was among the first researchers to suggest that these mysterious mobile elements of the genome might play some kind of regulatory role, determining which genes are turned on and when this activation takes place (McClintock, 1965).

At about the same time that McClintock performed her groundbreaking research, scientists Roy Britten and Eric Davidson further speculated that TEs not only play a role in regulating gene expression, but also in generating different cell types and different biological structures, based on where in the genome they insert themselves (Britten & Davidson, 1969). Britten and Davidson hypothesized that this might partially explain why a multicellular organism has many different types of cells, tissues, and organs, even though all of its cells share the same genome. Consider your own body as an example: You have dozens of different cell types, even though the majority of cells in your body have exactly the same DNA. If every single gene was expressed in every single one of your cells all the time, you would be one huge undifferentiated blob of matter!

The early speculations of both McClintock and Britten and Davidson were largely dismissed by the scientific community. Only recently have biologists begun to entertain the possibility that this so-called “junk” DNA might not be junk after all. In fact, scientists now believe that TEs make up more than 40% of the human genome (Smit, 1999). It is also widely believed that TEs might carry out some biological function, most likely a regulatory one—just as McClintock and Britten and Davidson speculated. Like all scientific hypotheses, however, data from multiple experiments were required to convince the scientific community of this possibility.

Total nonsense.

Noisy interacting proteins?

Here is an abstract from a recent paper in Science Signaling. I haven’t read it in detail yet, but it is refreshing to see someone discussing the possibility that not everything that happens in the cell is optimized, given that we know various processes that generate nonfunctional parts.

Abstract: Any engineered device should certainly not contain nonfunctional components, for this would be a waste of energy and money. In contrast, evolutionary theory tells us that biological systems need not be optimized and may very well accumulate nonfunctional elements. Mutational and demographic processes contribute to the cluttering of eukaryotic genomes and transcriptional networks with “junk” DNA and spurious DNA binding sites. Here, we question whether such a notion should be applied to protein interactomes—that is, whether these protein interactomes are expected to contain a fraction of nonselected, nonfunctional protein-protein interactions (PPIs), which we term “noisy.” We propose a simple relationship between the fraction of noisy interactions expected in a given organism and three parameters: (i) the number of mutations needed to create and destroy interactions, (ii) the size of the proteome, and (iii) the fitness cost of noisy interactions. All three parameters suggest that noisy PPIs are expected to exist. Their existence could help to explain why PPIs determined from large-scale studies often lack functional relationships between interacting proteins, why PPIs are poorly conserved across organisms, and why the PPI space appears to be immensely large. Finally, we propose experimental strategies to estimate the fraction of evolutionary noise in PPI networks.

E. D. Levy, C. R. Landry, S. W. Michnick, How Perfect Can Protein Interactomes Be? Sci. Signal. 2, pe11 (2009).