Is most of the human genome functional?

I first became interested in genome size because of its tie-ins with important evolutionary questions in which I was (and still am) interested, such as punctuated vs. gradual patterns, levels of selection, and adaptive vs. non-adaptive processes. What I didn’t realize was that one component of the question, the quantity of DNA that is non-functional (but not necessarily inconsequential) with regard to the phenotype of the organism, is such a hot-button issue. I had vague inklings at first that young-earth creationists would object to the idea of non-functional DNA — because God, as they say, don’t make no junk. (Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated). And of course there has always been a persistent undertone in biology that non-coding DNA must be doing something or it would have been deleted. This latter view, which derives directly from a hardcore adaptationist approach, destroys the argument by creationists that “Darwinism” has prevented researchers from considering functions for non-coding DNA. Indeed, the main motivation for the early papers on “selfish DNA” was to counter this adaptationist assumption (Doolittle and Sapienza 1980).

Creationist nonsense about DNA does not surprise me. What has intrigued me much more is the debate among biologists about this, and the rather questionable claims, suppositions, and extrapolations that get made not just by the media but by various scientists themselves.

Take Francis Collins. He’s a major player in genome biology and led the charge by the public Human Genome Project. And yet, he makes claims that non-coding DNA may be present in the genome “just in case” it needs to be put to use in the future. This makes no sense from an evolutionary perspective. It would be tempting to attribute this to Collins’s adherence to the notion of theistic evolution, but in fact one can find this sort of fuzzy foresight argument being brought up by lots of authors. I suppose it’s just disappointing that there is not better communication between genome biology and evolutionary biology.

The case that frustrates me most is that of John Mattick. He of the worst figure ever is one of the primary promulgators of the view that scientists have overlooked possible function for non-coding DNA and that this is “one of the biggest mistakes in the history of molecular biology” that can only be corrected by a “new paradigm”, and so on. Basically, the argument seems to be that much of the non-coding portion of a given genome is involved in regulation and such. In the past, Mattick has refrained from pinning down an estimate of how much non-coding DNA he believes is functional, but his presentation of (extremely selective) data left little doubt that he considers more non-coding DNA to be correlated with greater complexity. But now we’re starting to get some more explicit and increasingly bold claims.

As Check (2007) pointed out in a news article in Nature,

Mattick thinks scientists are vastly underestimating how much of the genome is functional. He and Birney have placed a bet on the question. Mattick thinks at least 20% of possible functional elements in our genome will eventually be proven useful. Birney thinks fewer are functional.

Now consider this quote by Comings (1972), who was the first person to use the term “junk DNA” extensively (even before Ohno’s (1972) coinage appeared in print):

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

So, even if Mattick is right about 20% of the human genome being functional, which is considered a rather high estimate on the basis of available data, he still would be merely agreeing with the author of the first major discussion about junk DNA.

Now, I should point out that I do not have a vested interest in how much of the human genome is functional. 5%? Fine. 20%? Fine. 50%? Ok. I will go where the data indicate. My reason for rejecting the notion of “more complexity means more DNA” is comparative: I refer you to the “onion test” for a simple illustration. However, as readers of Genomicron already know, I find it rather irksome when people take any new finding about (potential) function in some part of the human genome and extrapolate this to mean that all DNA in every genome must be serving some role.

Anyway, back to what Mattick suggests. As noted, for the most part he has gone about arguing for large-scale function more by hint than by direct claim. However, finally he says the following (Phaesant and Mattick 2007).

Thus, although admittedly on the basis of as yet limited evidence, it is quite plausible that many, if not the majority, of the expressed transcripts are functional and that a major component of genomic information is rapidly evolving regulatory DNA and RNA. Consequently, it is possible that much if not most of the human genome may be functional. This possibility cannot be ruled out on the available evidence, either from conservation analysis or from genetic studies, but does challenge current conceptions of the extent of functionality of the human genome and the nature of the genetic programming of humans and other complex organisms. [Emphasis added]

It seems to me that “we can’t rule this out” is not a reason to think that something is plausible, let alone true. In fact, the existence of mechanisms such as transposable element spread and the pseudogenization of duplicate genes suggests that there is good reason to expect much (probably most) of the genome to be non-functional unless data show otherwise. Some TEs have taken on a function, some cause disease, some are merely benign or only slightly detrimental. The proportions of non-coding elements in each of these categories remain to be determined, but they are not all equally likely by default.

The question of which sequences are functional, and in what way, is one of the more contentious and therefore interesting ones in genome biology. On the one hand, new information from various sources including the ENCODE project indicates that much non-coding is transcribed, though it remains an open question whether this has to do with function or noise. On the other hand, a recent analysis has suggested that as many as 4,000 sequences within the human genome initially thought to be genes are not really genes after all (Clamp et al. 2007), bringing the total count down to around 20,000.

Some people, mostly creationists and strict adaptationists (strange bedfellows, I agree) desperately want the vast non-coding majority of eukaryote DNA to have a function. They latch onto any new discovery of function in some segment of the genome or another (or indeed, any mere restatement of what many authors have been saying since the 1970s) and consider their position supported. The rest of us will just have to wait and see.



Check, E. (2007). Genome project turns up evolutionary surprises. Nature 447: 760-761.

Clamp, M., B. Fry, M. Kamal, X. Xie, J. Cuff, M.F. Lin, M. Kellis, K. Lindblad-Toh, and E.S. Lander (2007). Distinguishing protein-coding and noncoding genes in the human genome. Proceedings of the National Academy of Sciences USA 104: 19428-19433.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Phaesant, M. and J.S. Mattick (2007). Raising the estimate of functional human sequences. Genome Research 17: 1245-1253.

9 comments to Is most of the human genome functional?

  • Chris Harrison

    “”Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated””

    I’ve been having a discussion on a facebook group with ID proponent Mike Gene (of fame, and the author of “A Consilience of Clues”) about this very subject. Mike stated that IDists should “acknowledge that irrational design counts against the ID hypothesis” because “we should not expect irrational deign from life’s designer”.

    So I asked him “What do we do about the ~19,000 pseudogenes in the human genome?”
    To which he responded:

    “It is best explained by evolution. Their lack of any function is inconsistent with design.”

    And speaking to non-coding DNA/functionality, but ignoring ID, I find that this topic generates confusion for people who are pro-ev/anti-ID. Take this link for example:

    In the section “Dead code, bloat, comments (‘junk dna’)”, the author states that 97% of the human genome is composed of introns which are spliced out, and that the remaining 3% are the exons. Later, still talking about non-coding DNA, the author says the existence of this non-coding DNA could be explained by its impact on folding propensity. Again the author’s confusion that all non-coding DNA is intronic causes him to slip up here as well.

    Larry Moran would cry if he read the section about the central dogma of molecular biology, incidentally.


  • TR Gregory

    Well, irrational aspects of design may be arguments against a divine designer, but not against design itself, as many human-designed objects have suboptimal or irrational aspects. I do not take suboptimality alone as evidence for evolution, but rather suboptimal characters that are best explained by historical processes. I am still waiting for an unambiguous accounting for why intelligent design proponents assume that design must be perfect if they do not and can not determine the identity, motives, or method of any designer.


  • Chris Harrison

    Yes, I see what you mean. It seems quite arbitrary for IDists to say “we should not expect irrational deign from life’s designer.”

    In Mike’s case, he suggest front loaded evolution which he later claimed in our discussion “does not entail the non-existence of junk DNA.”

    In other words, all the “non-rational” design is clearly due to blind, mechanistic processes. The more rational the design, the more it casts doubts on the mechanistic prowess of ateleological mechanisms (mutation, selection, drift, cooption etc.)

    Yeah, not very persuasive to me either.


  • Jud

    “Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated.”

    Certainly “unstated” in any scientific sense, but then what in ID has been “stated” in rigorous scientific terms?

    There have been plenty of statements from ID proponents in typical innuendo-and-out-the-other style. Chris alluded to the “front loading” argument that was beloved for some time on UD (and may be still – I’ve tired of reading the site). Chris has also noted the ID argument that anything that’s truly non-functional is evidence of the exclusively destructive results of mutation, leaving the functional stuff for the Designer. These two arguments are at least to some extent contradictory (is anything currently non-functional a Designer’s preparation for the future or truly junk that results from destructive processes?). If one indulges sloppy argumentation, that “explains” virtually any proportion of functional vs. non-functional DNA.

    As for why ID proponents would assume there is no non-functional DNA, or at least assume a need to explain away non-functional DNA in terms that allow a supremely competent Designer, we all know the Designer’s identity is the Mother of All ID Innuendoes.



    I cant speak for Mattick,only for myself, and in my opinion the issue is not whether a very few people were actually researching and gaining hints in to what many had written off as junk. The real issue is that even though this research was happening (as Gregory correctly cites) the status quo was largely ignoring this research. As students and especially in the general public, the message conveyed to us was that only 2-3% of our genome served a purpose while the rest was useless junk, and that this was predicted by the theory. 
    It seems the vast majority were more interested in using this useless vestigial junk DNA paradigme as a poster child for bad design (and in the process got a lot of mileage out of it) as in the words of Kenneth Miller who just a few short years ago said “Intelligent design cannot explain the presence of a nonfunctional pseudogene, unless it is willing to allow that the designer made serious errors, wasting millions of bases of DNA on a blueprint full of junk and scribbles. Evolution, however, can explain them easily. Pseudogenes are nothing more than chance experiments in gene duplication etc” 
    Whether they are, or not is based more on opinion than anything else, and we now know that even Pseudo genes seem to demonstrate some important function and we are learning more and more all the time, and my opinion, not only because of technological advances, but also  because of new interest of more and more researchers who no longer cling to this useless junk paradigme (which one can make the point did slow down progress) as even Wikipedia has stated.
    You can find many more quotes like the one cited from Miller and from many other so called teachers and communicators of science than you can for any research in the earlier days. People like PZ Myers are still perpetuating this useless vestigial paradigme myth in college lectures. 
    Recently he told a group of students in a lecture that only gene coding regions had function and that the rest was useless garbage just as the theory had predict it would be.  
    He went on to say that those who are belly aching about potential function are only biologist looking for job security. He said this to a group of young impressionable atheist biology students who idolized him. What kind of message is that to give? Talk about a science stopper. Instead of stirring interest and encouraging further discovery, this man and others like him are still perpetuating this out dated paradigme. To me this is the real issue. I applaud Gregory for his work and contributions to science, especially on the subject of C Values enigma, but as a historian you left out a lot of important facts.


    • Thanks for your comments. However, I disagree with your assessment. The claim that I am challenging is that biologists ignored possible functions for non-coding DNA. That is patently false, as I have tried to show with reference to the actual scientific literature from the 1970s to the present. What authors of popular books say about implications for design is not particularly relevant to the actual claim of interest here, which is about what scientists did or did not consider regarding possible functions in the past.



    I understand where you’re coming from and I hope you understand where I’m coming from also, but I cant in all honesty retract my statements concerning the fact that we as in students (former student) and the general public have been fed un-enlighten information by many communicators of science as documented in everything from lectures to television programs, as well as those I have already cited. If it is indeed true that people who were researching this field of study considered the possibility of function, then these people I spoke of cannot claim ignorance. I will not beat a dead horse. However, I have just one question. Now that we are finding a great deal of empirical evidence for function in many of these non protein coding regions, can we agree that this finding did not meet a once thought of prediction, and can we also agree the same for the discovery that…. “At the center of the C-value enigma is the observation that genome size does not correlate with organismal complexity”?


  • Elle

    Dr. Gregory, does this represent a page turner on the subject of “junk” DNA as it has been presented by media here in Italy? 

    “Collectively, the papers describe 1,640 data sets generated across 147 different cell types. Among the many important results there is one that stands out above them all: more than 80% of the human genome’s components have now been assigned at least one biochemical function.”

    That seems to be a much higher percentage of functional components than the one suggested by previously estimates.

    Are you planning to write something about ENCODE?


Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>