The junk DNA quotes of interest series.

One of the things I am happiest about having back online with the restored blog is the Quotes of Interest series. I had gone through the contemporary primary literature describing the discovery of each new category of non-genic DNA and showed that in every single case, functions were considered for these sequences (if not outright assumed to exist). Here’s the list of posts from this series:

From non-coding to coding genes.

I sometimes get asked if non-coding elements (usually “junk DNA” is what they say) can ever evolve into genes. I usually say that transposable elements, at least, can be coopted into functional roles, and that it wouldn’t be so odd if a pseudogene took on a novel function sometime through mutations. Kind of a lame answer, I know, but there haven’t been too many unambiguous examples yet, so cut me some slack.

Anyway, here’s a story in New Scientist that describes a report of three genes unique to humans that appear to have arisen from non-coding DNA. I don’t know about other researchers, but I certainly didn’t consider this “virtually impossible” (as New Scientist states), just rare.

Three human genes evolved from junk

I have to give New Scientist credit on this story for not going with the easy, lazy, and incorrect “everyone thought it was junk but now it’s all turning out to have a function!” template. As the author, Michael LePage, writes:

The researchers conclude that three of these non-coding sequences must have mutated in humans and become capable of coding for the short proteins at some point since we diverged from chimps six million years ago. While at least half the non-coding DNA in humans is junk with no function, it is not clear whether the non-coding DNA from which the genes evolved had any function.

Such “de novo” gene evolution was once thought impossible because random mutations are highly unlikely to produce a DNA sequence that encodes a protein of any length, let alone a protein that will be transcribed by cells and do anything useful. But in 2006, several de novo genes were discovered in fruit flies. Since then, it’s become clear that genes do continually evolve in this way.

Part of the explanation might be that biological systems are very noisy: even though most of our DNA is junk, most of it still gets transcribed into RNA at times, and some of that RNA probably reaches cells’ protein-making machinery. This means that when mutations do throw up sequences capable of encoding proteins, some may get “tested” and useful ones selected for. As more primate genome data becomes available, McLysaght estimates a further 15 human genes will turn out to have evolved de novo.

LePage, by the way, also wrote the excellent piece Evolution: 24 Myths and Misconceptions.

The abstract of the forthcoming paper by Knowles and McLysaght (2009):

The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.

Non-coding DNA and night vision.

Ok, check this out!

Seemingly misplaced DNA acts as lenses

Reporting on Solovei et al (2009)

We show that the nuclear architecture of rod photoreceptor cells differs fundamentally in nocturnal and diurnal mammals. The rods of diurnal retinas possess the conventional architecture found in nearly all eukaryotic cells, with most heterochromatin situated at the nuclear periphery and euchromatin residing toward the nuclear interior. The rods of nocturnal retinas have a unique inverted pattern, where heterochromatin localizes in the nuclear center, whereas euchromatin, as well as nascent transcripts and splicing machinery, line the nuclear border. The inverted pattern forms by remodeling of the conventional one during terminal differentiation of rods. The inverted rod nuclei act as collecting lenses, and computer simulations indicate that columns of such nuclei channel light efficiently toward the light-sensing rod outer segments. Comparison of the two patterns suggests that the conventional architecture prevails in eukaryotic nuclei because it results in more flexible chromosome arrangements, facilitating positional regulation of nuclear functions.

Misc media.

Busy preparing for the start of the semester, so to tide you over here are some links of things to check out.

1) In our genes, old fossils take on new roles
by David Brown, Washington Post

It turns out that about 8 percent of the human genome is made up of viruses that once attacked our ancestors. The viruses lost. What remains are the molecular equivalents of mounted trophies, insects preserved in genomic amber, DNA fossils.

2) Gaming evolves
by Carl Zimmer, New York Times

Evolutionary biologists like Dr. Near and Dr. Prum, who have had a chance to try the game, like it a great deal. But they also have some serious reservations. The step-by-step process by which Spore’s creatures change does not have much to do with real evolution. “The mechanism is severely messed up,” Dr. Prum said.

Nevertheless, Dr. Prum admires the way Spore touches on some of the big questions that evolutionary biologists ask. What is the origin of complexity? How contingent is evolution on flukes and quirks? “If it compels people to ask these questions, that would be great,” he said.

I may have to check out this game.

3) Research raises questions about DNA barcoding methodology
by Andrea Anderson, GenomeWeb Daily News

This one is about the PNAS article by Song et al. that at first seemed like it was going to get a lot of hype (it did from NSF, but other venues decided it wasn’t worth a story). A lot of silliness going on with this one that I can’t really talk about, but suffice it to say I am not impressed with this paper or the conduct of the authors. I’ll just quote from the linked story.

“Sadly, the authors of this paper do not understand barcoding protocols,” Paul Hebert, director of the Biodiversity Institute of Ontario at the University of Guelph, told GenomeWeb Daily News. Calling the title of the paper misleading, he said barcoders have been aware of nuclear pseudogenes for years and have already designed some strategies for dealing with the problems described in the paper.

“Given that pseudogenes were reported 25 years ago, it’s not new news to us,” Hebert said. He said the team focused on species in which numts are particularly common and drew conclusions based on these eight species. Barcoding projects such as iBOL, he said, include data from thousands of species and are carried out using methods that differ from those described in the paper.
Hebert emphasized that the Barcoding of Life Data Systems, or BOLD, database scours sequences for indels, stop codons, and other tell-tale pseudogene signs. Barcoding sequences are also screened against a pool of sequences representing known contaminants, he said. Sequences that raise red flags are then set aside for further assessment, including longer sequence analysis or RT-PCR.
And, he noted, large barcoding studies typically amalgamate DNA barcode data with information provided by taxonomy, morphology, ecology, and other biological measures. “We’ve never advocated that sequence information alone is declarative for species boundaries,” he said.

For his part, Crandall conceded that large barcoding projects such as iBOL “have excellent strategies for quality control of data” and are already applying many of the steps he and his colleagues recommended. Still, he said, even though some people are already worrying about numts does not mean everyone in the field is addressing the problems appropriately.

And the junk DNA train rolls on…

This appeared in my weekly automated journal search. I have ordered the paper as I can’t find an online copy, but the abstract pretty much covers what the argument will be. Same old pre-1980s adaptationist idea presented as radically novel.

Mallik, M. and Lakhotia, S.C. 2008. Noncoding DNA is not “junk” but a necessity for origin and evolution of biological complexity. Proceedings of the Indian National Science Academy Section B – Biological Sciences 77 (Sp. Iss.): 43-50.

All eukaryotic genomes contain, besides the coding information for amino acids in different proteins, a significant amount of noncoding sequences, which may or may not be transcribed. In general, the more evolved or biologically complex the organisms are, greater is the proportion of the noncoding component in their genomes. The popularity and success of “central dogma of molecular biology” during the last quarter of the 20(th) century relegated the noncoding DNA sequences to a mortifying status of “junk” or “selfish”, even though during the pre-“molecular biology” days there were good indications that such regions of the genome may function in as yet unknown ways. A resurgence of studies on the noncoding sequences in various genomes during the past several years makes it clear that the complex biological organization demands much more than a rich proteome. Although the more popularly known noncoding RNAs are the small microRNAs and other similar species, other types of larger noncoding RNAs with critical functions in regulating gene activity at various levels are being increasingly,identified and characterized. Many noncoding RNAs are involved in epigenctic modifications, including imprinting of genes. A comprehensive understanding of the significance of noncoding DNA sequences in eukaryotic genomes is essential for understanding the origin and sustenance of complex biological organization of multicellular organisms.

See also: Junk DNA and the Onion Test.

A few more quotes about non-coding DNA.

Just for fun, here are some quotes I came across while reading a few sources for a paper I am writing.

Remember, a significant number of creationists, science writers, and molecular biologists want us to believe that non-coding DNA was totally ignored after the term “junk DNA” was published in 1972, that the authors of the “junk DNA” and “selfish DNA” papers denied any possible functions for non-coding elements, and, in the case of creationists, that “Darwinism” is to blame for this oversight. The latter of these is nonsensical as the very ideas of “junk DNA” and “selfish DNA” were postulated as antidotes to excessive adaptationist expectations based on too strong a focus on Darwinian natural selection at the organism level.

For those of you who didn’t read the earlier series, see if you can guess when these statements were made.


There is a strong and widely held belief that all organisms are perfect and that everything within them is there for a function. Believers ascribe to the Darwinian natural selection process a fastidious prescience that it cannot possibly have and some go so far as to think that patently useless features of existing organisms are there as investments for the future.

I have especially encountered this belief in the context of the much larger quantity of DNA in the genomes of humans and other mammals than in the genomes of other species.

Even today, long after the discovery of repetitive sequences and introns, pointing out that 25% of our genome consists of millions of copies of one boring sequence, fails to move audiences. They are all convinced by the argument that if this DNA were totally useless, natural selection would already have removed it. Consequently, it must have a function that still remains to be discovered. Some think that it could even be there for evolution in the future — that is, to allow the creation of new genes. As this was done in the past, they argue, why not in the future?


A survey of previous literature reveals two emerging traditions of argument, both based on the selectionist assumption that repetitive DNA must be good for something if so much of it exists. One tradition … holds that repeated copies are conventional adaptations, selected for an immediate role in regulation (by bringing previously isolated parts of the genome into new and favorable combinations, for example, when repeated copies disperse among several chromosomes). We do not doubt that conventional adaptation explains the preservation of much repeated DNA in this manner.

But many molecular evolutionists now strongly suspect that direct adaptation cannot explain the existence of all repetitive DNA: there is simply too much of it. The second tradition therefore holds that repetitive DNA must exist because evolution needs it so badly for a flexible future–as in the favored argument that “unemployed,” redundant copies are free to alter because their necessary product is still being generated by the original copy.


These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.


There is a hierarchy of types of explanations we use in efforts to rationalize, in neo-darwinian terms, DNA sequences which do not code for protein. Untranslated messenger RNA sequences which precede, follow or interrupt protein-coding sequences are often assigned a phenotypic role in regulating messenger RNA maturation, transport or translation. Portions of transcripts discarded in processing are considered to be required for processing. Non-transcribed DNA, and in particular repetitive sequences, are thought of as regulatory or somehow essential to chromosome structure or pairing. When all attempts to assign a given sequence or class of DNA functions of immediate phenotypic benefit to the organism fail, we resort to evolutionary explanations. The DNA is there because it facilitates genetic rearrangements which increase evolutionary versatility (and hence long-term phenotypic benefit), or because it is a repository from which new functional sequences can be recruited or, at worst, because it is the yet-to-be eliminated by-product of past chromosomal rearrangements of evolutionary significance.


This is what I emphasized earlier, that this DNA must have a functional value since nothing is known so widespread and universal in nature that has proven useless.


I’ve stopped using the term [‘junk’] …Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.

Answers to be provided in the comments.

Quotes of interest — science news stories.

We have been told in science news stories since the early 1990s that biologists long neglected the potential significance of noncoding DNA. (Sadly, this is in line with the claims made by creationists, who claim that “Darwinism” is to blame despite the obvious fact that Darwinian adaptationism would expect functions. Some biologists likewise play up the notion that we have ignored noncoding sequences and just now are coming to appreciate them, thanks, no doubt, to their own revolutionary insights, but again, this ignores a diverse literature on the topic spanning the rise of the tools necessary for such work up to the present.) But what about the science stories that were actually written during the supposed period during which noncoding DNA was dismissed as uninteresting (i.e. 1980 to the early 1990s)?

If you had a subscription to Science in the 1980s, you would have read stories like these by their science writer Roger Lewin:

Lewin, R. 1981. Evolutionary history written in globin genes. Science 214: 426-427.

Even though the human β-globin complex contains a relatively large number of active genes, 95 percent of the locus is made up of DNA that does not code for proteins. What is the role of this extra DNA, if any? The pseudogenes constitute just a small proportion of the region, although more pseudogenes might exist. Some of the DNA is made up of representatives of well-known families of repetitive sequences. And the remainder is DNA of no known function or comparable sequence.
“We wanted to test the hypothesis that this extra DNA is ‘junk DNA,'” says Jeffreys, “so we compared the β loci in humans, gorillas, and baboons.” Jeffreys and his colleagues reasoned that if it were junk DNA, then over the 20 to 40 million years of evolution represented by humans, apes, and Old World monkeys both the sequence and the overall quantity of intergenic DNA could be expected to vary. “It turned out that the cluster is remarkably stable,” reports Jeffreys. “The overall pattern and size of the cluster is the same, and the rate of nucleotide substitutions is one-quarter to one-fifth of what be expected in functionless DNA”. The noncoding DNA therefore appears not to be junk, but what function it might perform is still a mystery.

Lewin, R. 1982. Repeated DNA still in search of a function. Science 217: 621-623.
[Reporting about an NIH International Workshop in Highly Repeated DNA July, 1982]

Interest in repetitive DNA sequences goes back many years but, as with many aspects of molecular biology, the advent of recombinant DNA technology and DNA sequencing now permits previously unmatched scrutiny of the structures of interest.

If mobility is a reality, and most agree that it probably is, then it seems likely that at least some members of repeat families will have important effects in the genome, even if they have no formal function. Enhancing recombination and altering rates of gene expression are obvious possibilities, while the initiation of new species is a more recondite proposal.

The truth is, however, that the functions of the large and motley collection of repeated DNA families are proving particularly resistant to elucidation. Putative functions are many, including, variously, involvement in chromosome pairing, control of gene expression, processing of messenger RNA precursors, and participation in DNA replication. So far none has been established, save for the single exception of a small family that gives rise to 7S RNA, a molecule that recently was serendipitously discovered to be an essential component of a particle that mediates the secretion of proteins from cells.

Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense; some will likely be shown to exert important effects; and the remainder may well have no function or effect at all and can therefore be called selfish DNA. Repetitive DNA constitutes a substantial proportion of the genome (up to 90 percent in some cases), and there is considerable speculation on how it will eventually be divided between these three groups. Current bets would put a small fraction in the function category, with distribution of the rest rising steeply through the effect and selfish categories.

Satellite DNA unquestionably is a puzzle. What determines the number of copies in a repeat family? And how does the genome tolerate so much of it? Perhaps, as Singer has recently promulgated, just a small fraction of the satellite sequences is essential to some genomic function while the remainder is harmless surplus. This, she indicates, is a comfortable middle ground between the extreme selfish DNA position, which sees no function in all this “junk DNA,” and the adaptationist position, which looks for functions in every structure. The same questions and speculations can be applied to dispersed repetitive DNA.

One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription into RNA. A significant proportion of nuclear RNA contains transcripts of repeated sequences, although 90 percent of this is lost in RNA processing and exit to the cytoplasm. Davidson and his colleagues have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, an appealing argument for some regulatory function. Most intriguing, however, is the discovery that only a small proportion of any repeat family is ever transcribed. “Most members appear to be quiescent, which must make you cautious when isolating samples in search of their function.”

It is clear that, from their abundance, their unusual structure, and their frequent transcription, dispersed repetitive DNA families cannot be ignored. But it is equally clear that for the most part they, like their tandemly repeated relatives, remain a phenomenon in search of a function.

Lewin, R. 1982. Adaptation can be a problem for evolutionists. Science 216: 1212-1213.

Molecular biology of recent years has revealed many new and intriguing categories of DNA, some of which appear to have no role. One explanation of this has been that the nonaptive sequences provide raw material for future evolution. But the logic of natural selection does not allow for selection for future use. More likely is that the accumulation of nonaptive DNA is a consequence of the innate property of repeated sequences of nucleic acid to replicate and move around the genome. Later it may be recruited to perform some role, in which case it becomes an exaptation.

Lewin, R. 1983. A naturalist of the genome. Science 222: 402-405.

Some mobile elements are large and complex, measuring as much as 10,000 nucleotides in length and carrying many genes, while others are simple sections of repeated DNA just a few hundred nucleotides long. Some people would classify all such elements as “junk” or “parasitic” DNA. Others strongly demur and insist that, for instance, although there is yet to be found any convincing evidence for the involvement of a limited class of elements in development in organisms other than maize, the possibility should by no means be dismissed. In any case it is clear that the mobility of certain genetic elements is essential in the generation of the huge diversity of antibodies in vertebrates and in the production of different antigenic coats in certain parasites. Jumping genes clearly represent a potentially rich source of mutation. In addition, an evolutionary link between mobile elements and retroviruses now seems incontrovertible, as does a causal relationship with certain cancers.

Lewin, R. 1985. More progress in messenger RNA splicing. Science 228: 977.

This summer marks 8 years since eukaryotic genes were first discovered to be interrupted by noncoding sequences, known variously as intervening sequences or introns. The discovery raised two sets of questions. The first concerns the origin and function-if any-of introns, which, by its very nature, is a very difficult question to test and therefore remains somewhat in the realms of speculation, although significant insights are being made.The second focuses on the mechanics of removal of these sequences in the production of mature RNA molecules, and in principle should be experimentally more tractable. The immense effort directed at this second question has produced during the past 8 years some conventional biochemistry, some novel and surprising nucleic acid chemistry, and a great deal of frustration.

Lewin, R. 1986. “Computer genome” is full of junk DNA. Science 232: 577-578.

Many biologists were unhappy with the idea that much of the DNA might have no function, says Loomis. “There is a very strong feeling that if a molecule, or any kind of biological structure, exists, then it must be serving some kind of selectively advantageous purpose. I disagree with this viewpoint very strongly.” Loomis prefers to turn the question around. “We should ask, ‘what is the selective advantage of getting rid of a particular structure?’ This is not common thinking.”

It is of course very difficult to prove that a structure or a sequence of DNA has no function. “People will always say, ah, but you haven’t looked under the right conditions,” says Loomis. In the case of multigene families, the best data come from mutation experiments.

Lewin, R. 1988. Chance and repetition. Science 240: 603.

With some kind of concerted effort to map and sequence the entire human genome now appearing to be inevitable, there will be much excitement at the prospect of discovering what is encoded in the 3-billion-base “message”. There are certain to be some surprises, perhaps even equivalent in magnitude to the discovery a decade ago of long, noncoding sequences that interrupt the great majority of eukaryotic genes. But there are many biologists who expect large parts of the genome to be devoid of any function at all: “We face the prospect of trudging through huge tracts of junk DNA,” remarked British molecular biologist Sydney Brenner during one of the many recent panel discussions on the project.

At least some proportion of the DNA in the genomes of most organisms is in the form of these so-called middle repetitive sequences, ranging from 3% to as much as 70%: typically, the bigger the genome, the more repetitive DNA. There is a long tradition in biology that, seeing structures as extensive as these, argues that there must be a functional explanation for them.

Biologists have long speculated about the function of middle repetitive sequences, with regulation of gene expression being one popular notion. Loomis and Gilpin’s perspective, however, is that, although some middle repetitive sequences may have acquired a function once they have formed, there is no need to invoke function as a selective pressure for their origin.


Part of the Quotes of interest series.

Non-functional DNA: non-functional vs. inconsequential.

Each copy of the human genome consists of about 3,200,000,000 base pairs, and includes about 500,000 repeats of the LINE-1 transposable element (a LINE) and twice as many copies of Alu (a SINE), as compared to around 20,000 protein-coding genes. Whereas protein-coding regions represent about 1.5% of the genome, about half is made up LINE-1, Alu, and other transposable element sequences. These begin as parasites, and some continue to behave as detrimental mutagens implicated in disease. However, most of those in the human genome are no longer mobile, and it is possible that many of these persist as commensal freeloaders. Finally, it has long been expected that a significant subset of non-coding elements would be co-opted by the host and take on functional roles at the organism level, and there is increasing evidence to support this.

A notable fraction of the non-genic portion of human DNA is undoubtedly involved in regulation, chromosomal function, and other important processes, but based on what we know about non-coding DNA sequences, it remains a reasonable default assumption — though one that should continue to be tested empirically — that much or perhaps most of it is not functional at the organism level. This does not mean that a search for the functional segments is futile or irrelevant — far from it, as many non-genic regions are critical for normal genomic operation and some have played an important role in many evolutionary transitions. It simply means that one must not extrapolate without warrant from discoveries involving a small fraction of sequences to the genome as a whole.

More generally, it has been known for more than 50 years that the total quantity of DNA in the genome is linked to nucleus size, cell size, cell division rate, and a wide range of organism-level characteristics that derive from these cytological features. Thus, large amounts of DNA tend to be found in large, slowly dividing cells, which in turn typically make up the bodies of organisms with low metabolisms, slow development, or other such traits. On this basis alone, one would expect to see consequences for the organism if a large quantity of non-coding DNA were eliminated from or added to the genome, even if most of the particular elements in question were neutral or detrimental under normal circumstances. Non-functional is not equivalent to inconsequential. This is especially true when there are factors operating at different levels, for example when an abundant and diverse collective of entities includes components that are variously neutral, beneficial, and detrimental to a host.

Though they cannot prove an argument, analogies are often useful for understanding an issue. In this capacity, consider the following:

  • There are roughly 1013 to 1014 individual microorganisms living in your digestive tract (Gill et al. 2006), which is on par with, or perhaps even 10x larger than, the number of cells making up your own body. It is also two or three orders of magnitude larger than the number of humans who have ever lived, and of the number of stars in the Milky Way galaxy.
  • The assemblage of microorganisms in your intestines comprises some 500 species, most of which have never been cultured in the lab or studied in detail (Gilmore and Ferretti 2003). To put this diversity in perspective, there are only about 5,000 species of mammals on Earth today.
  • The combined “metagenome” of the microorganisms in your gut contains at least 100 times as many genes as your own genome (Gill et al. 2006).

We do not know the specific characteristics of many of the microorganisms in the gut. However, we do know that at least some of them are essential, or at least highly beneficial, for human health. Several of the species found in the gut are important mutualists, assisting with digestion and in return drawing nutrients from the food that we consume. In this sense, it is hard not to agree with Gill et al. (2006), who argue that “humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes”.

The question is, are all 10,000,000,000,000+ microbial cells that we carry with us functional for our well-being? Some certainly are. But many, maybe even most, are probably commensal freeloaders who neither harm nor benefit us, though of course their total abundance is limited to what can be carried by the host without deleterious consequences. By contrast, some gut bacteria are implicated in gastrointestinal disorders. A few are actively parasitic, but their numbers may be kept in check by our own immune system or through competition with non-pathogenic species, or because they kill the host or are killed by antibiotics. Some, such as the well known Escherichia coli, can be harmless or deadly depending on the presence of particular genes. Thus, the total number of microorganisms, and the relative diversity of species that this encompasses, is influenced by a complex interaction of factors internal to the gut (e.g., who invades, which microorganisms are already present, how efficiently they reproduce) and higher-level conditions (e.g., human immune response, dietary effects on which nutrients are present, positive or negative effects on the host).

What we know about bacteria and other microorganisms makes for a reasonable default assumption that much or even most of what is found in the gut is not there because it provides a direct benefit to humans. On the flipside, we have good reason to expect that some, perhaps even a large fraction, of these organisms are beneficial. Therefore, we require evidence to show that any particular species is functional from the human point of view, and that its abundance is determined on this basis. The search for such evidence is important, but it occurs against a backdrop of realizing that bacteria could be there for their own benefit only, whether or not that has any adverse effects on our well-being as hosts. Establishing that a specific strain of bacteria in the digestive tract is beneficial does not justify the conclusion that all bacteria in the gut are mutualistic. It does not even imply that all individuals of the helpful strain are essential, because the optimal abundance for the host and the pressures for reproduction of the microorganisms may not converge on the same quantity.

If one were to remove the microorganisms from the gut, or to significantly alter their species composition or abundance, one would expect to see consequences for host health. This would be true even if most of the particular organisms in question were neutral or detrimental in normal circumstances. As with non-genic elements in the genome, this means that even if many organisms in the gut are non-functional from the host’s perspective, their presence is not inconsequential for the biology of an animal carrying them.

The junk DNA collection.

In this post, I will maintain an up to date list of substantive posts dealing with the topic of “junk DNA” on this blog and various others.



See also

Quintessence of Dust


Non-functional DNA: quantity.

In my previous post, I noted that because of what we understand about the nature, origins, and cross-taxon quantitative diversity of the various sorts of non-genic DNA in large eukaryote genomes, the default assumption is that much or even most of it is not functional at the cell and organism levels. Thus, the burden of proof rests with authors who claim that a large fraction, or indeed most or all, of this DNA is functional for the organisms in which it occurs.

This should not be construed as claiming that all non-genic DNA is assumed to be non-functional. I have pointed out in various preceding posts that even those who postulated non-adaptive explanations for its existence did not rule out — and indeed, explicitly favoured — the notion that a significant portion would turn out to serve a function. You need not take my word for this, as it is not difficult to find unambiguous statements from the original authors themselves.

For example, here are Orgel and Crick (1980) who, along with Doolittle and Sapienza (1980), first proposed the concept of “selfish DNA” in detail:

It would be surprising if the host genome did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another.

Here, too, is Comings (1972), the first person to use the term “junk DNA” in print and the first to provide a substantive discussion of the topic. (The term was coined by Ohno in 1972, but Comings’s paper appeared in print first, citing Ohno as ‘in press’, and Ohno used the term only in the title).

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

The use of the terms “selfish DNA” or “junk DNA” has changed over time, and both are now often applied to all non-genic DNA, rather than to the sequences to which they originally referred (i.e., transposable elements and pseudogenes, respectively). Moreover, it seems that many authors — at least those whose studies focus primarily on protein-coding genes and DNA sequencing — believe that the assumption has been that all non-genic DNA is “junk” in the sense of totally non-functional. However, amidst any such assumptions there has always been a diversity of views on the subject, ranging from assuming that most non-genic DNA is non-functional (as in the quotes above) to expecting it all to be functional — the latter being a position held by strict adaptationists, and a large part of the motivation for proposing the alternative view of selfish DNA the first place.

As with many issues in evolution, this is a matter of relative quantity, not an exclusive dichotomy. We may reasonably expect a significant fraction of non-genic DNA to show evidence of function, and the pursuit of such evidence is a valid and important endeavour. It does not follow, however, that the pendulum must be perceived to swing from entirely functional to entirely non-functional and back again. We will undoubtedly refine our estimates of the amount of non-genic DNA that is mutualistic at the organism level, how much is commensal, and how much is best characterized as parasitic in nature.

As it stands, the evidence suggests that about 5% of the human genome is functional at the organism level. The total may be higher — as noted, Comings suggested 20% is actively utilized. It is conceivable that 50% or more of the genome is functional, perhaps in structural roles or some other higher-order capacity. It would require evidence to support this contention, however, and the question would remain as to why an onion requires 5x more of this structural or otherwise essential DNA, and why some of its close relatives can get by with half as much while others have twice the onion amount. There is nothing remarkable about onions in this sense, by the way — animal genome sizes alone cover a more than 7,000-fold range, and even among vertebrates there is a 350-fold difference. The range among single-celled protozoa is at least 30,000-fold, though even higher estimates have been presented.

The take home message is simply this. What we know about eukaryote genomes suggests that there are many mechanisms that can add non-coding DNA that do not require it to be functional. This does not in any way preclude the possibility of, or invalidate the search for, function in some, many, or possibly even most of those non-coding components. How much proves to be functional is an empirical question, and at present the indication seems to be that most non-genic DNA is non-functional. That said, non-functional is not the same as inconsequential.


Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.