Junk DNA gets Wired.

There is a new article on the Wired website about junk DNA [One Scientist’s Junk Is a Creationist’s Treasure]. I make a very brief appearance in it, and I just want to clarify what I meant by the statement cited (I’m still learning that even an hour-long interview might result in only a short blurb).

My quote is “Function at the organism level is something that requires evidence”. I make this statement because there are several different sorts of DNA sequences in the genome whose presence can be explained even if they do not benefit (and indeed, even if they slightly harm) the organism carrying them. Pseudogenes, satellite DNA, transposable elements (45% of our genome), and other non-coding sequences may or may not be functional — that requires evidence — and some may exist as a result of accidental duplication or even due to selection at the level of the elements themselves (by “intragenomic selection”). The old assumption that all non-coding DNA must be beneficial to the organism or it would have been deleted by now ignores genome-specific processes by which non-coding DNA evolves.

As I have discussed previously, both hardcore adaptationists (if any exist anymore) and creationists have a vested interest in having all non-coding DNA be functional. I believe that real-world variability in genome size argues strongly against such a prospect, but of course it is possible, and this is the point that people like Ohno, Doolittle, Orgel, and Crick made in the 1980s. The important point is that yes, some non-coding DNA is functional at the organism level (as opposed to existing for its own sake or because there is no strong selection against it). And certainly, non-coding DNA has effects at the organism level. But current evidence suggests that about 5% of the human genome is functional, and even the least conservative ENCODE participants (whose primary, and important, objective is to identify the functional elements and their features) are betting that 20% is functional.

In the end, it is obvious that non-coding DNA is the product of evolution whether it all turns out to be functional or not. The cases in which former parasites (transposons) have taken on function at the organism level are a perfect illustration of cooption, which is the same basic process that allows explanations for the evolution of complex structures like eyes or flagella. The research into function of non-coding DNA, which the creationists are eager to cite, can be carried out only under an evolutionary framework — it is meaningless to talk about “conserved non-coding DNA sequences” otherwise.

Finally, let me say one thing about Francis Collins’s quote: “Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.” With all due respect (which is considerable, given his contribution to the Human Genome Project), it makes no sense to explain the existence of non-coding DNA because it might someday prove useful. Evolution does not work that way. Elements might be coopted, but maintaining this option explains neither the origin nor the persistence of non-coding sequences.

As to what the creationists have to say, well, I leave that to others with more (or less?) patience to attend to.



Gene Genie and the DNA Network.

Here are some of the positive developments among blogs that I am happy to discuss.

The latest edition of Gene Genie is now up on Gene Sherpas, in which I have two contributions. This is my first blog carnival, and I want to thank our host and everyone else involved. The next round will be located at Eye on DNA on June 3, so remember to submit your links that you would like to have included. See here for earlier entries.

The second piece of news, as many people have already noted, describes a new network of genetics blogs entitled The DNA Network, to which you are welcome to subscribe by Feedburner. This is the outcome of efforts by Rick Vidal of My Biotech Life and Hsien-Hsien Lei from Eye on DNA.

Current members include:


Effect versus function.

There has been quite a bit of discussion in the media recently about discoveries of [indirect evidence for] functions in [small portions of] non-coding DNA. Unfortunately, the parts in square brackets are often omitted. It is also the case that many reports overlook the important distinction between effect and function, leaving readers with the impression that non-coding DNA can only be either totally insignificant or vitally important.

Here is the relevant part of the Merriam-Webster Dictionary entry on function:

“The action for which a person or thing is specially fitted, used, or responsible or for which a thing exists.”

And on effect:

“Something that is produced by an agent or cause; something that follows immediately from an antecedent; a resultant condition.”

In other words, a function fulfills a specific role to produce a positive result, with a close fit between cause and outcome shaped by either design (in human technology) or natural selection (in biological systems). Effects are also the outcome of identifiable causes, but they can be positive, neutral, or negative and may be generated directly or indirectly by the causal mechanism. Thus, it is not possible to have a function without any effects, but something can exert an effect — perhaps a very important one — without this constituting a function.

Consider an example. The immune system of the body has a clear function: to defend against pathogens. Viruses likewise have functions, but this only makes sense if one considers the issue from the perspective of the viruses themselves and not of their hosts. Specifically, parts of the virus function in allowing them to circumvent the host’s immunity and to usurp its replication machinery. Viruses do, however, have effects on hosts — usually negative, but apparently sometimes indirectly positive.

The genomes of eukaryotes consist of many types of DNA sequences. The exons that encode proteins make up a small percentage (less than 2% in humans), and the rest is non-coding DNA of various sorts: introns, pseudogenes, satellite DNA, and especially transposable elements (also called TEs, transposons, or mobile elements). The latter represent a diverse set of sequences that are capable of moving about and duplicating in the genome independently of the normal replication process. In this sense, they are often considered “parasites” of the “host” genome. Overall, TEs also make up the largest portion of non-coding DNA in the genomes analyzed so far (at least 45% in humans), although the particular types, abundances, and levels of activity of TEs vary among species.

Some TEs have evidently been co-opted (exapted) to perform functions at the host level, meaning that they have moved from being parasites to integrated participants in the functioning of the genome. This includes regulating genes, involvement in the genetic cutting-and-splicing mechanism of the vertebrate immune system, and perhaps cellular stress response. On the other hand, many diseases can result from mutations caused by the insertion of a TE into an existing gene. From the perspective of the host, TEs can have different effects depending on the context: some TEs are functional but some are detrimental. The large majority, however, have not been shown to fall into either category.

Nevertheless, a lack of evidence for either function or harm does not mean that TEs are without effects. It is well known that the total amount of DNA (genome size) is linked to cell size, cell division rate, metabolic rate, and developmental rate. In other words, a large genome is typically found in large, slowly dividing cells within an organism displaying a low metabolic rate and sluggish development. Conversely, organisms with high metabolic rate or rapid development tend to have small genomes. To the extent that total DNA content directly affects cell size and division, these can be considered effects — by their presence in the aggregate — of non-coding DNA elements.

Is slowing down metabolism or delaying development a function? Some authors think so, but most would argue that these are effects that are tolerated by the organism because they are not overly detrimental. That is, parasites spread within the genome and individually may have little or no effect (and no function), but in sum may have substantial effects on the cell and organism. The amount of accumulation would depend on the tolerance of the organism based on its biology. For example, it is unlikely that a mammal with a high metabolic rate could have a genome the size of a salamander’s.

The point of this discussion is to note that seeking functions for non-coding DNA is an interesting area of research, but that even if most sequences are not functional, they can still be important from a biological perspective. Similarly, one would not invoke function for hosts to explain the existence of viruses, nor would one dismiss viruses as unimportant if functions were never found at the host level. One would, however, focus considerable attention on explaining how viruses spread, why some are more virulent than others, and how they exert their effects.

Non-coding DNA and the opossum genome.

The genome sequence of the gray short-tailed opossum, Monodelphis domestica, was published in today’s issue of Nature (Mikkelsen et al. 2007). It is interesting for many reasons, including its status as the first marsupial genome to be sequenced, its relatively large genome size, and low chromosome number (2n = 18). It is also interesting because it contains a similar number of genes (18,000 – 20,000) to humans, the vast majority of which exhibit close associations with the genes of placental mammals. Also, in keeping with the hypothesis that transposable elements are the dominant type of DNA in most eukaryotic genomes, the comparatively large opossum genome is comprised of 52% transposable elements, the most for any amniote sequenced so far.

One of the most intriguing discoveries about the opossum genome is that changes to protein-coding genes seem not to have been the driving force behind mammalian diversification. Instead, non-coding elements with regulatory functions — mostly derived from formerly parasitic transposable elements — appear to underly much of the difference.

Now, I would prefer to just talk about the science here, noting that this is yet another great example of the complex nature of genome evolution, the key role played by “non-standard” genetic processes (Gregory 2005), and the ever-increasing relevance of non-coding DNA in genomics. But, inevitably, I must comment on how this discovery has been reported. Here is what ScienceDaily (which I otherwise like a great deal) said about it:

Opossum Genome Shows ‘Junk’ DNA Source Of Genetic Innovation


The research, released Wednesday (May 9) also illustrated a mechanism for those regulatory changes. It showed that an important source of genetic innovation comes from bits of DNA, called transposons, that make up roughly half of our genome and that were previously thought to be genetic “junk.”

The research shows that this so-called junk DNA is anything but, and that it instead can help drive evolution by moving between chromosomes, turning genes on and off in new ways.


It had been initially thought that most of a creature’s DNA was made up of protein-coding genes and that a relatively small part of the DNA was made up of regulatory portions that tell the rest when to turn on and off.

As studies of mammalian genomes advanced, however, it became apparent that that view was incorrect. The regulatory part of the genome was two to three times larger than the portion that actually held the instructions for individual proteins.

I will just reiterate two brief points, as I have already dealt with some of these topics in earlier posts (and will undoubtedly have to do so again in the future). One, very few people have actually argued that all non-coding DNA is 100% functionlesss “junk”, and no one is surprised anymore when a regulatory or other function is observed for some non-coding DNA sequences. Moreover, transposable elements are more commonly labeled as “selfish DNA”, and it has been noted in countless articles that they can and do take on functions at the organism level even if they begin as parasites at the genome level. Two, yet again we are talking about a small portion of the genome such that this should not be considered a demonstration that all non-coding DNA is functional. In particular, the authors identified about 104 million base pairs of DNA that is conserved (i.e., shared and mostly invariant) among mammals, about 29% of which overlapped with protein-coding genes. In other words, about 74 million base pairs of non-coding DNA, much of it derived from former transposable elements, is found to be conserved among mammals and shows signs of being functional in regulation. The genome size of the opossum is probably around 3,500 million bases, which means that this functional non-coding DNA makes up 2% of the genome.

A note to science writers. There is nothing surprising about some sequences of non-coding DNA having an important function. The notion that all non-coding DNA has long been assumed to be completely functionless junk is a straw man. And to avoid misleading readers, you really need to specify that most examples of non-coding DNA with a function represent a very small portion of the total genome.



Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Mikkelsen, T.S., M.J. Wakefield, B. Aken, C.T. Amemiya, J.L. Chang, S. Duke, M. Garber, A.J. Gentles, L. Goodstadt, A. Heger, J. Jurka, M. Kamal, E. Mauceli, S.M.J. Searle, T. Sharpe, M.L. Baker, M.A. Batzer, P.V. Benos, K. Belov, M. Clamp, A. Cook, J. Cuff, R. Das, L. Davidow, J.E. Deakin, M.J. Fazzari, J.L. Glass, M. Grabherr, J.M. Greally, W. Gu, T.A. Hore, G.A. Huttley, M. Kleber, R.L. Jirtle, E. Koina, J.T. Lee, S. Mahony, M.A. Marra, R.D. Miller, R.D. Nicholls, M. Oda, A.T. Papenfuss, Z.E. Parra, D.D. Pollock, D.A. Ray, J.E. Schein, T.P. Speed, K. Thompson, J.L. VandeBerg, C.M. Wade, J.A. Walker, P.D. Waters, C. Webber, J.R. Weidman, X. Xie, M.C. Zody, J.A.M. Graves, C.P. Ponting, M. Breen, P.B. Samollow, E.S. Lander, and K. Lindblad-Toh. 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447: 167-177.

Gene number and complexity.

Leaving aside the difficulty in defining terms such as “complexity” and “gene“, there has been for many decades an underlying assumption that there ought to be some relationship between morphological complexity and the number of protein-coding genes within a genome. This is a holdover from the pre-molecular era of genetics, when it was at first thought that total genome size should be related to gene number, and thus to complexity. Indeed, the constancy of DNA content within chromosome sets (“C-values”) was taken as evidence that DNA is the substance of heredity, and yet it was recognized as early as 1951 that there is no clear relationship between the amount of DNA per genome and organismal complexity (e.g., Mirsky and Ris 1951; Gregory 2005). By 1971, this had become known as the “C-value paradox” because it seemed so self-contradictory (Thomas 1971). (The solution to the C-value paradox was that most eukaryotic DNA is non-coding, although this raises plenty of questions of its own).

Nevertheless, one sometimes encounters arguments that there is a positive correlation between complexity and genome size, even in the scientific literature. Let me put to rest the notion that genome size is related to complexity on the broad scale of eukaryotic diversity. Here is a figure from Gregory (2005) showing the known ranges and means for more than 10,000 species of animals, plants, fungi, protists, bacteria, and archaea (click image for larger view).

The notion that gene number and complexity should be related has survived largely intact into the post-genomic era, in no small part due to the popular tendency to describe genomes as “blueprints”. Genomes are not blueprints because there is no direct correspondence between a given bit of the genome and a particular piece of the organism. If one must have an analogy for how genomes operate, then a far more appropriate one is with recipes and cakes. No single word in a recipe specifies a particular crumb of a cake, but following the recipe correctly will result in a cake nonetheless. It probably does not need spelling out, but genomes are the recipe, development is the process of mixing ingredients and baking, and organisms are the cake.

Now, one might expect that a more complex cake would require a more verbose recipe, and indeed on a very general level this is true: viruses have very few genes, bacteria and archaea have more, and eukaryotes have more still. Beyond that, however, it is not necessarily the case that a complex cake needs a recipe with more individual instructions. If the language is very efficient — for example, if one sentence in the recipe can convey several steps, or if one can combine the same basic instructions in different ways to make different parts of the cake — then a short recipe might easily produce a more complex cake than one that goes on for several pages.

While predictions regarding human gene number varied considerably prior to the completion of the human genome sequence in 2001, it was nevertheless somewhat surprising that the gene count is only about 20,000-25,000 for a human (International Human Genome Sequencing Consortium 2004). In fact, some people started calling this the “G-value paradox” or “N-value paradox” (for Gene or Number) in reference to the older C-value paradox (Claverie 2001; Betrán and Long 2002; Hahn and Wray 2002).

Here is how Comings (1972) described the C-value paradox:

Being a little chauvinistic toward our own species, we like to think that man is surely one of the most complicated species on earth and thus needs just about the maximum number of genes. However, the lowly liverwort has 18 times as much DNA as we, and the slimy, dull salamander known as Amphiuma has 26 times our complement of DNA. To further add to the insult, the unicellular Euglena has almost as much DNA as man.

And here are Harrison et al. (2002) (probably mostly facetiously):

The sequencing of the genomes of six eukaryotes has provided us with a related quandary: namely, how is the number of genes related to the biological complexity of an organism (termed an ‘N-value’ paradox by Claverie [2001])? How can our own supremely sophisticated species be governed by just 50-100% more genes than the nematode worm?

Of course, neither the “C-value paradox” nor the “G-value paradox” is a paradox at all. As I have said elsewhere, this simply follows the common but erroneous equation of simplistic expectation + contradictory data = “paradox”. Some genes may encode multiple proteins and gene regulation may be more important than gene number, which means that constructing a complex organism does not require a large number of genes any more than it requires a large genome. No paradoxes.

But why might less complex organisms possess large numbers of genes? Rice (Oryza sativa), for example, is thought to have about 50,000 genes, or twice as many as humans (Goff et al. 2002; Yu et al. 2002). One possible explanation is that rice is an ancient polyploid whose entire genome was duplicated in its ancestry. (At least one round of genome duplication also happened early in the evolution of vertebrates, though most lineages now behave genetically as diploids).

But what about something like a purple sea urchin (Strongylocentrotus purpuratus), whose genome apparently encodes 23,300 genes? As deuterostomes, sea urchins are more closely related to vertebrates than to other invertebrates, but that alone does not explain the fact that they have a gene number roughly equivalent to humans (at least, not under the simplified view of genome evolution being discussed). Further, relatedness to self-described complex organisms certainly can’t explain why corals, which are very distant relatives of vertebrates and considered to be relatively “simple” animals, also have somewhere around 20,000 to 25,000 genes.

It turns out that genes involved in immunity are extraordinarily abundant in sea urchins and corals, and that this could account for a significant portion of their total gene number. (Sensory and developmental genes also appear to be very well represented in the sea urchin genome). It is well known that pathogen populations can evolve rapidly and thus that a single host defense mechanism may not remain effective for long. Vertebrates handle the infectious onslaught with a two-tiered system. First, “innate immunity“, which is based on non-specific immune reactions to pathogen attack and is the first response of the body’s immune system. This sort of immunity involves a suite of genes that generate a generalized but limited immune response. In this case there is something of a link with complexity, namely that in order to have a more complex set of possible responses, one would need to have more such genes. All animals possess innate immunity, but only the jawed vertebrates also exhibit “adaptive immunity“, which provides a tailored response to individual pathogens. This system does not involve an individual gene for every possible pathogen, but rather employs an array of duplicated genes that can be shuffled in an effectively limitless number of combinations, like railway cars on a long train, to produce a wide variety combinations of antibodies.

The net result is that vertebrate immunity is more flexible, but that this is achieved not through the addition of tens of thousands of new genes, but through the evolution of a system that can recombine existing genes. Groups like echinoderms and cnidarians, by contrast, may require more immune genes to accomplish an effective level of defense because they lack this ability to use existing genes in a large number of combinations. While analogies between human inventions and biological systems can be very problematic, it does seem apt to point out that more sophisticated technologies are frequently simpler, smaller, and more efficient, with fewer parts. A large number of components and a high degree of physical complexity can represent the primitive rather than the derived state in both engineering and evolution.

More DNA generally, or more genes in particular, need not relate to morphological complexity. The more knowledge has accumulated about the size, content, and regulation of genomes, the more the basis for expecting such an association has eroded. Being shocked by, or even ashamed of, the fact that humans do not reign supreme in terms of genome size or number of genes is not the appropriate reaction. Rather, realizations such as these should be exciting and should stimulate the next generation of genomic investigation.



Betrán, E., and M. Long. 2002. Expansion of genome coding regions by acquisition of new genes. Genetica 115: 65–80.

Claverie, J.-M. 2001. What if there are only 30,000 human genes? Science 291: 1255–1257.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Goff, S.A. et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100.

Gregory, T.R. 2005. Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics 6: 699-708.

Gregory, T.R. 2006. Genomic puzzles old and new. ActionBioScience.org.

Hahn, M.W. and G.A. Wray. 2002. The g-value paradox. Evolution & Development 4: 73-75.

Harrison, P.M., A. Kumar, N. Lang, M. Snyder, and M. Gerstein. 2002. A question of size: the eukaryotic proteome and the problems in defining it. Nucleic Acids Research 30: 1083-1090.

International Human Genome Sequencing Consortium. 2004. Finishing the euchromatic sequence of the human genome. Nature 431: 931–945.

Mirsky, A.E. and H. Ris. 1951. The desoxyribonucleic acid content of animal cells and its evolutionary significance. Journal of General Physiology 34: 451-462.

Pennisi, E. 2006. Sea urchin genome confirms kinship to humans and other vertebrates. Science 314: 908-909.

Rast. J.P., L.C. Smith, M. Loza-Coll, T. Hibino, and G.W. Litman. 2006. Genomic insights into the immune system of the sea urchin. Science 314: 952-956.

Sea Urchin Genome Sequencing Consortium. 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science 314: 941-952.

Thomas, C.A. 1971. The genetic organization of chromosomes. Annual Review of Genetics 5: 237-256.

Yu et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 92-100.

Comments on "Noncoding DNA and Junk DNA" (re-post).

The following is a re-post of my comments on the recently posted Noncoding DNA and Junk DNA at Sandwalk. Needless to say, I am quite pleased to see such active discussion about non-coding DNA. Passages in italics are excerpts from the original article.

TR Gregory said…

Ryan Gregory has serious doubts about the usefulness of the term as he explains in his excellent article A word about “junk DNA”.

Just to clarify, I think the term could be useful — indeed, it was useful when Ohno coined it. The problem is that it is seldom used in an appropriate way. If the meaning were specified explicitly to be “regions strongly suspected of being non-functional with evidence to back it up” (which, incidentally, is not the original definition according to Ohno (1972) or Comings (1972)), and if people used it only in this way, then I would not have a problem with this. But given the difficulty that people seem to have in accepting that some DNA may truly not have a function at the organism level, I don’t know if we could ever get it to be used with such precision.

…a new term, Junctional DNA, to describe DNA that probably has a function but that function isn’t known… think we don’t need to go there. It’s sufficient to remind people that lots of DNA outside of genes has a function and these functions have been known for decades.

That neologism was suggested in response to Minkel’s appeal for a term that would “make the distinction between functional and nonfunctional noncoding DNA clear to a popular audience”. My main suggestion was to call DNA by what it is known to be, if at all possible, by function (“regulatory DNA”, “structural DNA”) or by type (“pseudogene”, “transposable element”, “intron”). Your definition of “junk DNA” is also more precise than most usages, meaning that you specify that the term only be applied to sequences for which there is evidence (not just assumption) of non-function. That leaves us with something in between for journalists to talk about with a catchy buzzword. “Junctional DNA” lets them specify that we’re not talking about “junk DNA” or “functional DNA” — i.e., there is some evidence for function (e.g., being conserved) but no evidence of what that function is. The main utility would be to stop the very frustrating leap that gets made from “this 1% of the genome may have a function, so the whole thing must have this function” kind of reporting. Now they could say “another 1% has moved into the category of ‘junctional DNA'”. I think that would be considerably less misleading than current wording.

Note that I’m avoiding the term “noncoding” DNA here. This is because to me the term “coding DNA” only refers to the coding region of a gene that encodes a protein … there are many genes for RNAs that are not properly called coding regions so they would fall into the noncoding DNA category … introns in eukaryotic genomes would be “noncoding DNA” as far as I’m concerned. I think that Ryan Gregory and others use the term “noncoding DNA” to refer to all DNA that’s not part of a gene instead of all DNA that’s not part of the coding region of a protein encoding gene. I’m not certain of this.

By definition, non-coding DNA is, and always has been, everything other than exons. The reason this is relevant is that early work in genome biology assumed that there should be a 1 to 1 correspondence between DNA content and protein-coding gene number. This is work that occurred for at least two decades before the discovery of introns, pseudogenes, and other non-coding DNA. Now we have more descriptive names for the categories of DNA that are not the genes, all the genes, and nothing but the genes. I actually don’t know of anyone else who would have a problem calling introns, pseudogenes, and regulatory regions “non-coding DNA”. Certainly, Ohno, Crick, and many others have historically put introns in the same non-protein-coding grouping as pseudogenes. It’s just a category — you also have more specific subcategories to apply to each of the types of non-coding DNA. Perhaps your objection relates to an undue emphasis on the distinction between exons and everything else — well, that’s the history of the past half century of this field, so it should be no surprise that the terminology reflects this.

Read Gregory’s article for the short concise version of this dispute. What it means is that junk DNA threatens the worldviews of both Dembski and Dawkins!

Not quite. What you’re leaving out of this is the possibility of multiple levels of selection. In the original edition of The Selfish Gene (1976, p.76), Dawkins argued that “the simplest way to explain the surplus DNA is to suppose that it is a parasite, or at best a harmless but useless passenger, hitching a ride in the survival machines created by the other DNA”. Cavalier-Smith (1977) drew a similar conclusion (before he had read Dawkins), and Doolittle and Sapienza (1980) and Orgel and Crick (1980) [yes, that Crick] independently developed the concept of “selfish DNA” a few years later. This is an explicitly multi-level selection approach because it specifies that non-coding DNA can be present due to selection within the genome rather than exclusively on the organism (or gene, in Dawkins’s case) (see, e.g., Gregory 2004, 2005). (Incidentally, this idea of parasitic DNA dates back at least to 1945, when Gunnar Östergren characterized B chromosomes in this fashion). Of course, they tended to do what Ohno did and applied this one idea to all non-coding DNA, which is too ambitious. The modern view is more pluralistic (see, e.g., Pagel and Johnstone 1992 vs. Gregory 2003). Some non-coding DNA is just accumulated “junk” (in the definition of evidence-supported non-function that you espouse). Some (perhaps most) is “selfish” or “parasitic” and persists because there is selection within the genome as well as on organisms (in fact, an argument could be, and has been, made that “selfish DNA” would be a much more accurate term than “junk DNA” for most non-coding DNA). Some non-coding DNA is clearly functional at the organism level, including regulatory regions and chromosome structure components. Some of these latter functional non-coding DNA sequences are derived from elements that originally were of one of the first two types, most notably transposable elements that take on a regulatory function through co-option (or, in another manner of thinking, that undergo a shift in level of selection).

Junk DNA is not noncoding DNA and anyone who claims otherwise just doesn’t know what they’re talking about.

I’m afraid I don’t follow what you mean here. By your definition, “junk DNA” is any non-functional sequence of DNA, including pseudogenes (i.e., the original meaning). Those sequences do not encode proteins. Hence, your version of junk DNA is non-coding. I think this reflects the confusion that is imposed by the term “junk DNA”, which is why I generally think it is more obfuscating than enlightening.



Cavalier-Smith, T. 1977. Visualising jumping genes. Nature 270: 10-12.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Gregory, T.R. 2003. Variation across amphibian species in the size of the nuclear genome supports a pluralistic, hierarchical approach to the C-value enigma. Biological Journal of the Linnean Society 79: 329-339.

Gregory, T.R. 2004. Macroevolution, hierarchy theory, and the C-value enigma. Paleobiology 30: 179-202.

Gregory, T.R. 2005. Macroevolution and the genome. In The Evolution of the Genome (ed. T.R. Gregory), pp. 679-729. Elsevier, San Diego.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Östergren, G. 1945. Parasitic nature of extra fragment chromosomes. Botaniska Notiser 2: 157-163.

Pagel, M. and R.A. Johnstone. 1992. Variation across species in the size of the nuclear genome supports the junk-DNA explanantion for the C-value paradox. Proceedings of the Royal Society of London, Series B: Biological Sciences 249: 119-124.

Genomics, evolution, and health: comparisons of avian flu genomes.

An article by Steven Sternberg and colleagues is set to appear in the May issue of the journal Emerging Infectious Diseases. In it, the authors describe the results of complete genome sequence comparisons for 36 recent isolates of the avian flu virus (influenza H5N1). Their results “clearly depict the lineages now infecting wild and domestic birds in Europe and Africa and show the relationships among these isolates and other strains affecting both birds and humans”. More specifically,

The isolates fall into 3 distinct lineages, 1 of which contains all known non-Asian isolates. This new Euro-African lineage, which was the cause of several recent (2006) fatal human infections in Egypt and Iraq, has been introduced at least 3 times into the European-African region and has split into 3 distinct, independently evolving sublineages.

Figure 1. Phylogenetic tree of hemagglutinin (HA) segments from 36 avian influenza samples. A 2001 strain (A/duck/Anyang/AVL-1/2001) is used as an outgroup at top. Clade V1 comprises the 5 Vietnamese isolates at the bottom of the tree, and clade V2 comprises the 9 Vietnamese isolates near the top of the tree. The European-Middle Eastern-African (EMA) clade contains the remaining 22 isolates sequenced in this study; the 3 subclades are indicated by red, blue, and purple lines. The reassortant strain, A/chicken/Nigeria/1047–62/2006, is highlighted in red.

This is a study in phylogenetics — that is, it reconstructs evolutionary relationships among viral strains using the same tools that many evolutionary biologists use to study the relationships among species. It is well known that viruses evolve very rapidly, and tracking their their past changes contributes to the ability to predict future ones. As the authors conclude,

These findings show how whole-genome analysis of influenza (H5N1) viruses is instrumental to the better understanding of the evolution and epidemiology of this infection, which is now present in the 3 continents that contain most of the world’s population. This and related analyses, facilitated by global initiatives on sharing influenza data, will help us understand the dynamics of infection between wild and domesticated bird populations, which in turn should promote the development of control and prevention strategies.

Evolution is not something that only happened to the myriad fossil specimens housed in museum drawers, and evolutionary biology is not merely relevant to academics tucked away in research labs. Evolution is both an ongoing process and an active and exciting area of research. More than ever, an understanding of the processes involved is relevant to the well-being of people from all regions of the world.

Chimps are not more evolved than humans or anyone else.

I like New Scientist. I even did a short interview with them about a cool genomics story (“How chemicals can speed up evolution“, 6 May 2006, p.16). But this headline from their news service really annoys me: Chimps ‘more evolved’ than humans.

The short news article starts out with “It is time to stop thinking we are the pinnacle of evolutionary success…”, which of course is true except that it was time to stop thinking this 150 years ago, and then continues with “… chimpanzees are the more highly evolved species, according to new research”.

What they mean is that, based on the recent study, it appears that the rate of fixation by selection of mutations apparently has been higher in the lineage that has led to chimpanzees than in the lineage that has led to humans since they split from a common ancestor several million years ago. Which lineage experienced the changes can now be inferred by comparison with the macaque genome, which is less closely related to chimps and humans than the latter two are to each other; without such an external comparison, one can not say which lineage had changed, only that one or both of them had. Most likely, this boils down to differences in long-term historical population sizes in the two lineages (selection is stronger in large populations, genetic drift in small populations).

Couching this interesting finding in terms of who is “more evolved” than whom is not helpful, even with the scare quotes. As someone who teaches evolution at the upper-year undergraduate level, I can tell you that students come into the class with a lot of preconceptions about evolution, one of them being the notion that some extant species can be ranked as “more evolved” than others. It is subtle misinformation like this, compounded over many years, that makes my job harder by the time they arrive in my course.

Please, please, PLEASE stop appealing to common misconceptions about evolution in news stories, even if the headline will catch the attention of (previously misinformed) readers.



Macaque genome published.

The April 13 issue of Science includes a collection of papers reporting and analyzing the sequence of the macaque (Macaca mulatta) genome. This marks the third primate genome to be sequenced (after human in 2001 and chimpanzee in 2005). Needless to say, comparisons of three genomes are far more informative than analyses involving only one or two sequences, and the papers contained in the special issue of Science already include some novel insights of evolutionary and medical significance that were previously unattainable. Carl Zimmer at The Loom provides a general summary of some key findings.

There is, rightly, a lot of interest in comparing genes among the three primate species. Non-coding DNA also gets a much-deserved amount of attention; in fact, this time we are fortunate enough to see an entire paper devoted to transposable elements. One general finding of interest relates to the number of transposable elements in the three genomes, which is remarkably similar (and quite high) in the three species. Here is the breakdown:

No wonder Ford Doolittle once remarked, probably only half-jokingly, that “our genomes … might be ironically viewed as vehicles for the replication of Alu sequences”. They do, after all, outnumber protein-coding genes by about 50 : 1.

The Genomes OnLine Database (GOLD) provides a list of other completed and forthcoming genome sequences. The macaque is only the latest in a rapidly growing list of genome projects that will continue to provide exciting new information about the evolution of genomes and the organisms carrying them.

A word about "junk DNA".

“It seems as though ‘junk DNA’ has become a legitimate jargon in a glossary of molecular biology. Considering the violent reactions this phrase provoked when it was first proposed in 1972, the aura of legitimacy it now enjoys is amusing, indeed.”

– Ohno and Yomo, 1991

The origin of “junk DNA”

Two main problems struck Susumu Ohno as particularly important in his seminal work on the genetics of evolutionary diversification. The first was the lack of correspondence between genome size (amount of DNA) and morphological complexity (taken as a proxy for gene number), which was a prominent topic of discussion in the early 1970s. As he noted in 1972, “If we take the simplistic assumption that the number of genes contained is proportional to the genome size, we would have to conclude that 3 million or so genes are contained in our genome. The falseness of such an assumption becomes clear when we realize that the genome of the lowly lungfish and salamanders can be 36 times greater than our own” (Ohno 1972a). In fact, Ohno and his colleagues were well aware that much of the DNA in the mammalian genome could not code for proteins, lest the mutational load become fatally high (e.g., Comings 1972; Ohno 1972b, 1974).

The second problem related to the conservative force of purifying selection and the limitations it places on the diversification of species. Ohno (1973) attempted to kill both of these vexatious birds with a single conceptual stone:

The points I wish to make are: 1) Natural selection is an extremely conservative force. So long as a particular function is assigned to a single gene locus in the genome, natural selection only permits trivial mutations of that locus to accompany evolution. 2) Only a redundant copy of a gene can escape from natural selection and while being ignored by natural selection can accumulate meaningful mutation to emerge as a new gene locus with a new function. Thus, evolution has been heavily dependent upon the mechanism of gene duplication. 3) The probability of a redundant copy of an old gene emerging as a new gene, however, is quite small. The more likely fate of a base sequence which is not policed by natural selection is to become degenerate. My estimate is that for every new gene locus created about 10 redundant copies must join the ranks of functionless DNA base sequence. 4) As a consequence, the mammalian genome is loaded with functionless DNA.

The corpulent genomes of dipnoans and urodele amphibians were similarly thus accounted for under this view: “Lungfish and salamanders clearly show the tragic consequences of exclusive dependence upon tandem duplication” (Ohno 1970, p.96). Of course, this differs from current thinking about lungfish and salamander genome size, but that’s another story.

To Ohno, this situation not only permitted, but also paralleled, the evolution of life at large. As he put it, “The earth is strewn with fossil remains of extinct species; is it any wonder that our genome too is filled with the remains of extinct genes?” (Ohno 1972a). The primary outcome of this gene duplication mechanism would not be the generation of new genes, but the deactivation of redundant copies – just as extinction has been the fate of more than 99% of species that have ever lived (Raup 1991). Once purifying selection ceased to shelter gene sequences from change, they would be free to mutate and, if one imagines a set of three gene copies initially sharing the same sequence, it is likely that “in a relatively short time, two of the three duplicates would join the ranks of ‘garbage DNA’” (Ohno 1970, p.62).

In Ohno’s usage, as in the vernacular, “garbage” refers to both the loss of function and the lack of any further utility (it was once useful, but now it isn’t). “Garbage DNA” proved to be an unsuccessful meme, but its essence remains
in the wildly popular term coined by Ohno two years later – “junk DNA”. Thus, as Ohno (1972b) stated, “at least 90% of our genomic DNA is ‘junk’ or ‘garbage’ of various sorts”. Interestingly, Ohno mentioned “junk DNA” only in the titles of two of his papers (1972a, 1973), and invoked the term only once in passing in a third (1972b). Comings (1972), on the other hand, gave what must be considered the first explicit discussion of the nature of “junk DNA”, and was the first to apply the term to all non-coding DNA.

There are several independent mechanisms by which non-coding DNA can accumulate in the genome. Gene duplication and deactivation is one such mechanism, but this, we now know, applies to only a minority of the non-coding sequences. Nevertheless, the term “junk DNA” was used in some early general descriptions of non-coding elements, including heterochromatin. For example, Comings (1972) noted that:

It has frequently been suggested that the DNA of genetically inactive heterochromatin represents the degenerate and useless DNA of the genome. However, heterochromatin rarely constitutes more than 20% of the genome. This suggests that there are two categories of junk DNA, (1) DNA of constitutive heterochromatin which is neither transcribed nor translated, and (2) nonheterochromatic junk DNA which is probably transcribed, but not translated. This distinction adds one more dimension to the mystery of heterochromatic DNA. Why is it singled out to be nontranscribable when being nontranslatable seems adequate for most of the junk DNA? Perhaps there is clustered junk (heterochromatic DNA) and nonclustered junk, just as there is clustered repetitious DNA (satellite DNA) and nonclustered repetitious DNA.

Later, Ohno himself began applying the term “junk” to heterochromatic, intergenic, and intronic sequences: “Much of this junk DNA occurs as large heterochromatin blocks, often localized in pericentric regions of mammalian chromosomes, or as intergenic spacers and intervening sequences within genes.” (Ohno 1985).

It is clear, however, that Ohno (1982) believed all these sequences were produced by gene duplication:

This great preponderance of intergenic spacers in the euchromatic region is due mostly to the extreme inefficacy of the mechanism of gene duplication as a means of creating new genes with altered active sites. For every redundant copy of the pre-existent gene that emerged triumphant as a new gene, hundreds of other copies must have degenerated to join the rank of junk DNA.

This mechanism alone was considered capable of explaining the vast intergenic regions of eukaryotic genomes. According to Ohno (1985):

Indeed, the abundance of pseudogenes (recent degenerates) attests to the inefficacy of gene duplication as a means of acquiring new genes with novel functions. The net consequence of hundreds of millions of years of continuous gene duplication is the desertification of the euchromatic region of modern vertebrates; the average distance between still functioning gene loci becoming progressively longer.

Junk DNA, function, and non-function

“Junk DNA” had a specific meaning when it first was formulated. It was meant to describe the loss of protein-coding function by deactivated gene duplicates, which in turn were believed to constitute the bulk of eukaryotic genomes. As different types of non-coding DNA were identified, the concept of gene duplication as their source – and therefore “junk DNA” as their descriptor – found new and broader application. However, it is now clear that most non-coding DNA is not produced by this mechanism, and is therefore not accurately described as “junk” in the original sense.

The term “pseudogene” — the technical term for functionless gene copies — was not coined until 1977 (Jacq et al. 1977), and the more explicit definition of these sequences that specified non-function in terms of protein-coding emerged almost a decade later. So, although Ohno’s original description of “junk DNA” obviously involved what are now called “pseudogenes”, there was no initial requirement for non-function. As Comings (1972) put it, “Being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded.” (This is what Sydney Brenner meant by the distinction between “trash” or “rubbish”, which one throws away, and “junk”, which one keeps; Brenner 1998). Of course, Ohno did reject the notion of protein-coding function for the extinct genes. As he described it, “a functional gene locus is defined as that DNA base sequence which may sustain deleterious mutations”, and from this it followed that “a DNA base sequence in which all sorts of mutational changes are permissible is obviously not contributing to the well-being of an organism, and for this very reason, it has no function” (Ohno 1973). On the other hand, and in the same publication, Ohno (1973) suggested a different role for non-coding DNA: “The bulk of functionless DNA in the mammalian genome may serve as a damper to give a reasonably long cell generation time (12 hours or so instead of several minutes)”.

From the very beginning, the concept of “junk DNA” has implied non-functionality with regards to protein-coding, but left open the question of sequence-independent impacts (perhaps even functions) at the cellular level. “Junk DNA” may now be taken to imply total non-function and is rightly considered problematic for that reason, but no such tacit assumption was present in the term when it was coined.

Two groups of people, though maximally divergent in their reasons for so doing, have been driven by a philosophical need to identify functions for all n
on-coding DNA. The first includes strict adaptationists, among whom it was often assumed that all non-coding DNA, by virtue of its very existence, must be endowed with some as-yet-unknown function of critical importance: “The very fact that amplified sequences have been maintained, withstanding rigours of selection, indicates some adaptive significance” (Sharma 1985).

We may also consider the following discussion comments recorded at the end of Ohno (1973):

Yunis: “This is what I emphasized earlier, that this DNA must have a functional value since nothing is known so widespread and universal in nature that has proven useless.”

Fraccaro: “Well, there is an exception to that rule. A lot of us have permanent positions at the University but are considered by others (mainly by students) meaningless and of no utility whatsoever.”

These examples aside, it seems likely that most evolutionary biologists today could tolerate a conclusion, if such were rendered, that a significant fraction of non-coding DNA is functionless
. This is not true of the second group in question, compared to whom the passion for function is unrivaled. As Dawkins (1999) suggested, “creationists might spend some earnest time speculating on why the Creator should bother to litter genomes with untranslated pseudogenes and junk tandem repeat DNA”. In fact, many have done so (e.g., Gibson 1994; Wieland 1994; Batten 1998; Jerlström 2000; Walkup 2000; Woodmorappe 2000; Bergman 2001). Although apparently “not enough is yet known about eukaryotic genomes to construct a comprehensive creationist model of pseudogenes” (Woodmorappe 2000), the theme that undergirds all of these discussions is that all non-coding DNA must, a priori, be functional.

To satisfy this expectation, creationist authors (borrowing, of course, from the work of molecular biologists, as they do no such research themselves) simply equivocate the various types of non-coding DNA, and mistakenly suggest that functions discovered for a few examples of some types of non-coding sequences indicate functions for all (see Max 2002 for a cogent rebuttal to these creationist confusions). Case in point: a few years ago, much ado was made of Beaton and Cavalier-Smith’s (1999) titular proclamation, based on a survey of cryptomonad nuclear and nucleomorphic genomes, that “eukaryotic non-coding DNA is functional”. The point was evidently lost that the function proposed by Beaton and Cavalier-Smith (1999) was based entirely on coevolutionary interactions between nucleus size and cell size.

Those who complain about a supposed unilateral neglect of potential functions for non-coding DNA simply have been reading the wrong literature. In fact, quite a lengthy list of proposed functions for non-coding DNA could be compiled (for an early version, see Bostock 1971). Examples include buffering against mutations (e.g., Comings 1972; Patrushev and Minkevich 2006) or retroviruses (e.g., Bremmerman 1987) or fluctuations in intracellular solute concentrations (Vinogradov 1998), serving as binding sites for regulatory molecules (Zuckerkandl 1981), facilitating recombination (e.g., Comings 1972; Gall 1981; Comeron 2001), inhibiting recombination (Zuckerkandl and Hennig 1995), influencing gene expression (Britten and Davidson 1969; Georgiev 1969; Nowak 1994; Zuckerkandl and Hennig 1995; Zuckerkandl 1997), increasing evolutionary flexibility (e.g., Britten and Davidson 1969, 1971; Jain 1980; reviewed critically in Doolittle 1982), maintaining chromosome structure and behaviour (e.g., Walker et al. 1969; Yunis and Yasmineh 1971; Bennett 1982; Zuckerkandl and Hennig 1995), coordingating genome function (Shapiro and von Sternberg 2005), and providing multiple copies of genes to be recruited when needed (Roels 1966).

Does non-coding DNA have a function? Some of it does, to be sure. Some of it is involved in chromosome structure and cell division (e.g., telomeres, centromeres). Some of it is undoubtedly regulatory in nature. Some of it is involved in alternative splicing (Kondrashov et al. 2003). A fair portion of it in various genomes shows signs of being evolutionarily conserved, which may imply function (Bejerano et al. 2004; Andolfatto 2005; Kondrashov 2005; Woolfe et al. 2005; Halligan and Keightley 2006). On the other hand, the largest fraction is comprised of transposable elements — some of which become co-opted by the host genome, some of which play major role in generating genomic variation, some of which may be involved in cellular stress response, and yet others of which remain detrimental to host fitness (Kidwell and Lisch 2001; Biémont and Vieira 2006). The upshot is that some non-coding DNA is most certainly functional — but when it is, this usually makes sense only in an evolutionary context, particularly through processes like co-option. More broadly, those who would attribute a universal function for non-coding DNA must bear the following in mind: any proposed function for all non-coding DNA must explain why an onion or a grasshopper needs five times more of it than anyone reading this sentence.

Should “junk” be thrown out?

There is nothing wrong with a word taking on a new meaning as knowledge changes – that is, unless reference to an original (and outmoded) sense lingers as a source of confusion, or the term expands so much as to lose contact with an initially accurate definition. Indeed, even the term “evolution” is technically a misnomer since its etymology implies an “unfolding”, as of a pre-determined developmental program (see Bowler 1975). The objection raised here is not to terms that change in usage per se, but to those whose shifting usage involves collecting or retaining unwanted conceptual baggage. This is especially relevant when the baggage is toted surreptitiously (note that no serious biologist takes “evolution” to mean a pre-determined unfolding but that ideas of inherent “progress” have been almost impossible to shake; see Gould 1996; Ruse 1996).

“Junk DNA”, which originally was coined in reference to now-functionless gene duplicates (i.e., true broken-down “junk”), is now used as “a catch-all phrase for chromosomal sequences with no apparent function” (Moore 1996). Its current usage also implies a lack of function which is accurate by definition for pseudogenes in regard to protein-coding, but which does not hold for all non-coding elements. The term has deviated from or outgrown its original use, and its continued invocation is non-neutral in its expression – and generation – of conceptual biases.

“Junk DNA” is not the only offender. Non-coding DNA has been called by many names that have had the same pejorative undertones (intentional or not) implying uselessness, if not outright wastefulness. Examples include excess DNA (Zuckerkandl 1976; Doolittle and Sapienza 1980), surplus or nonessential or degenerate or silent DNA (Comings 1972; Gilbert 1978), quiet DNA (Lefevre 1971), garbage DNA (Ohno 1970), non-informational or nonsense DNA (Ohno 1972b), worthless DNA (Ohno 1973), trivial DNA (Ohno 1974), vestigial DNA (Loomis 1973), redundant DNA (Vinogradov 1998), supplementary DNA (Hutchinson et al. 1980), secondary DNA (Hinegardner 1976), and incidental DNA (Jain 1980).

As Gould (2002, p.503) stated, “A rose may retain its fragrance under all vicissitudes of human taxonomy, but never doubt the power of a name to shape and direct our thoughts”. Because it is generally no longer applied in its original meaningful sense, because the type of DNA to which it actually relates now has a more descriptive name (pseudogenes), and because of its connotations of total phenotypic inertness, the term “junk DNA” should probably be abandoned in favour of less subjective terminology. “Non-coding DNA” serves this purpose quite well.

Concluding remarks

It is an exciting time in genome biology. Aspects of genomic form and function that were largely inconceivable only a few decades ago are now being revealed on a daily basis. It should come as no surprise (and indeed, it probably does not) that new roles are being discovered for non-coding DNA and that some of yesterday’s buzzwords — including “junk DNA” — are destined for the dustbin. However, extrapolating each report that a given small segment of DNA may be functional to mean that all non-coding DNA is vital is as counterproductive as dismissing non-coding DNA as totally non-functional. Genomes are complex, and there is little use in approaching them from a simplistic point of view.


Andolfatto, P. 2005. Adaptive evolution of non-coding DNA in Drosophila. Nature 437: 1149-1152.

Batten, D. 1998. ‘Junk’ DNA (again). Creation Ex Nihilo Technical Journal 12: 5.

Beaton, M.J. and T. Cavalier-Smith. 1999. Eukaryotic non-coding DNA is functional: evidence from the differential scaling of cryptomonad genomes. Proceedings of the Royal Society of London, Series B 266: 2053-2059.

Bejerano, G., M. Pheasant, I. Makunin, S. Stephen, W.J. Kent, J.S. Mattick, and D. Haussler. 2004. Ultraconserved elements in the human genome. Science 304: 1321-1325.

Bennett, M.D. 1982. Nucleotypic basis of the spatial ordering of chromosomes in eukaryotes and the implications of the order for genome evolution and phenotypic variation. In Genome Evolution (eds. G.A. Dover and R.B. Flavell), pp. 239-261. Academic Press, New York.

Bergman, J. 2001. The functions of introns: from junk DNA to designed DNA. Perspectives on Science and Christian Faith 53: 170-178.

Biémont, C. and C. Vieira. 2006. Junk DNA as an evolutionary force. Nature 443: 521-524.

Bostock, C. 1971. Repetitious DNA. Advances in Cell Biology 2: 153-223.

Bowler, P.J. 1975. The changing meaning of “evolution”. Journal of the History of Ideas 36: 95-114.

Bremmerman, H.J. 1987. The adaptive significance of sexuality. In The Evolution of Sex and its Consequences (ed. S.C. Stearns), pp. 135-161. Birkhauser Verlag, Basel.

Brenner, S. 1998. Refuge of spandrels. Current Biology 8: R669.

Britten, R.J. and E.H. Davidson. 1969. Gene regulation for higher cells: a theory. Science 165: 349-357.

Britten, R.J. and E.H. Davidson. 1971. Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty. Quarterly Review of Biology 46: 111-138.

Castillo-Davis, C.I. 2005. The evolution of noncoding DNA: how much junk, how much func? Trends in Genetics 21: 533-536.

Comeron, J.M. 2001. What controls the length of noncoding DNA? Current Opinion in Genetics & Development 11: 652-659.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Dawkins, R. 1999. The “information challenge”: how evolution increases information in the genome. Skeptic 7: 64-69.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Doolittle, W.F. 1982. Selfish DNA after fourteen months. In Genome Evolution (eds. G.A. Dover and R.B. Flavell), pp. 3-28. Academic Press, New York.

Gall, J.G. 1981. Chromosome structure and the C-value paradox. Journal of Cell Biology 91: 3s-14s.

Georgiev, G.P. 1969. On the structural organization of operon and the regulation of RNA synthesis in animal cells. Journal of Theoretical Biology 25: 473-490.

Gibbs, W.W. 2003. The unseen genome: gems among the junk. Scientific American 289(5): 46-53.

Gibson, L.J. 1994. Pseudogenes and origins. Origins 21: 91-108.

Gilbert, W. 1978. Why genes in pieces? Nature 271: 501.

Gould, S.J. 1996. Full House. Harmony Books, New York.

Gould, S.J. 2002. The Structure of Evolutionary Theory. Harvard University Press, Cambridge, MA.

Halligan, D.L. and P.D. Keightley. 2006. Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Research 16: 875-884.

Hinegardner, R. 1976. Evolution of genome size. In Molecular Evolution (ed. F.J. Ayala), pp. 179-199. Sinauer Associates, Inc., Sunderland.

Hutchinson, J., R.K.J. Narayan, and H. Rees. 1980. Constraints upon the composition of supplementary DNA. Chromosoma 78: 137-145.

Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.

Jain, H.K. 1980. Incidental DNA. Nature 288: 647-648.

Jerlström, P. 2000. Pseudogenes: are they non-functional? Creation Ex Nihilo Technical Journal 14: 15.

Kidwell, M.G. and D.R. Lisch. 2001. Transposable elements, parasitic DNA, and genome evolution. Evolution 55: 1-24.

Kondrashov, F.A. and E.V. Koonin. 2003. Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends in Genetics 19: 115-119.

Kondrashov, A.S. 2005. Fruitfly genome is not junk. Nature 437: 1106.

Lefevre, G. 1971. Salivary chromosome bands and the frequency of crossing over in Drosophila melanogaster. Genetics 67: 497-513.

Loomis, W.F. 1973. Vestigial DNA? Developmental Biology 30: F3-F4.

Makalowski, W. 2003. Not junk after all. Science 300: 1246-1247.

Max, E.E. 2002. Plagiarized errors and molecular genetics: another argument in the evolution-creation controversy. Talk.Origins Archive.

Moore, M.J. 1996. When the junk isn’t junk. Nature 379: 402-403.

Nowak, R. 1994. Mining treasures from ‘junk DNA’. Science 263: 608-610.

Ohno, S. 1970a. Evolution by Gene Duplication. Springer-Verlag, New York.

Ohno, S. 1970b. The enormous diversity in genome sizes of fish as a reflection of nature’s extensive experiments with gene duplication. Transactions of the American Fisheries Society 1970: 120-130.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Ohno, S. 1973. Evolutional reason for having so much junk DNA. In Modern Aspects of Cytogenetics: Constitutive Heterochromatin in Man (ed. R.A. Pfeiffer), pp. 169-173. F.K. Schattauer Verlag, Stuttgart, Germany.

Ohno, S. 1974. Chordata 1: protochordata, cyclostomata, and pisces. In Animal Cytogenetics, Vol. 4 (ed. B. John), pp. 1-92. Gebrüder Borntraeger, Berlin.

Ohno, S. 1982. The common ancestry of genes and spacers in the euchromatic region: omnis ordinis hereditarium a ordinis priscum minutum. Cytogenetics and Cell Genetics 34: 102-111.

Ohno, S. 1985. Dispensable genes. Trends in Genetics 1: 160-164.

Patrushev, L.I. and I.G. Minkevich. 2006. Eukaryotic noncoding DNA sequences provide genes with an additional protection against chemical mutagens. Russian Journal of Bioorganic Chemistry 32: 1068-1620.

Petsko, G.A. 2003. Funky, not junky. Genome Biology 4: 104.

Raup, D.M. 1991. Exctinction. W.W. Norton & Co., New York.

Roels, H. 1966. “Metabolic” DNA: a cytochemical study. International Review of Cytology 19: 1-34.

Ruse, M. 1996. Monad to Man. Harvard University Press, Cambridge, MA.

Shapiro, J.A. and R. von Sternberg. 2005. Why repetitive DNA is essential to genome function. Biological Reviews 80: 227-250.

Sharma, A.K. 1985. Chromosome architecture and additional elements. In Advances in Chromosome and Cell Genetics (eds. A.K. Sharma and A. Sharma), pp. 285-293. Oxford and IBH Publishing Co., New Delhi.

Slack, F.J. 2006. Regulatory RNAs and the demise of ‘junk’ DNA. Genome Biology 7: 328.

Vinogradov, A.E. 1998. Buffering: a possible passive-homeostasis role for redundant DNA. Journal of Theoretical Biology 193: 197-199.

Walker, P.M.B., W.G. Flamm, and A. McLaren. 1969. Highly repetitive DNA in rodents. In Handbook of Molecular Cytology (ed. A. Lima-de-Faria), pp. 52-66. North-Holland Publishing Co., Amsterdam.

Walkup, L.K. 2000. Junk DNA: evolutionary discards or God’s tools? Creation Ex Nihilo Technical Journal 14: 18-30.

Wickelgren, I. 2003. Spinning junk into gold. Science 300: 1646-1649.

Wieland, C. 1994. Junk moves up in the world. Creation Ex Nihilo Technical Journal 8: 125.

Woodmorappe, J. 2000. Are pseudogenes ‘shared mistakes’ between primate genomes? Creation Ex Nihilo Technical Journal 14: 55-71.

Woolfe, A., M. Goodson, D.K. Goode, P. Snell, G.K. McEwen, T. Vavouri, S.F. Smith, P. North, H. Callaway, K. Kelly, K. Walter, I. Abnizova, W. Gilks, Y.J.K. Edwards, J.E. Cooke, and G. Elgar. 2005. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biology 3: e7.

Yunis, J.J. and W.G. Yasmineh. 1971. Heterochromatin, satellite DNA, and cell function. Science174: 1200-1209.

Zuckerkandl, E. 1976. Gene control in eukaryotes and the C-value paradox: “Excess” DNA as an impediment to transcription of coding sequences. Journal of Molecular Evolution 9: 73-104.

Zuckerkandl, E. and W. Hennig. 1995. Tracking heterochromatin. Chromosoma 104: 75-83.

Zuckerkandl, E. 1997. Junk DNA and sectorial gene expression. Gene 205: 323-343.


Update: At Sandwalk, Larry Moran argues that the term “junk DNA” is “a good term”, “an accurate term”, and “a useful term”. You can read my response in the comments section of the original post or in my re-post on this blog.