Is most of the human genome functional?

I first became interested in genome size because of its tie-ins with important evolutionary questions in which I was (and still am) interested, such as punctuated vs. gradual patterns, levels of selection, and adaptive vs. non-adaptive processes. What I didn’t realize was that one component of the question, the quantity of DNA that is non-functional (but not necessarily inconsequential) with regard to the phenotype of the organism, is such a hot-button issue. I had vague inklings at first that young-earth creationists would object to the idea of non-functional DNA — because God, as they say, don’t make no junk. (Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated). And of course there has always been a persistent undertone in biology that non-coding DNA must be doing something or it would have been deleted. This latter view, which derives directly from a hardcore adaptationist approach, destroys the argument by creationists that “Darwinism” has prevented researchers from considering functions for non-coding DNA. Indeed, the main motivation for the early papers on “selfish DNA” was to counter this adaptationist assumption (Doolittle and Sapienza 1980).

Creationist nonsense about DNA does not surprise me. What has intrigued me much more is the debate among biologists about this, and the rather questionable claims, suppositions, and extrapolations that get made not just by the media but by various scientists themselves.

Take Francis Collins. He’s a major player in genome biology and led the charge by the public Human Genome Project. And yet, he makes claims that non-coding DNA may be present in the genome “just in case” it needs to be put to use in the future. This makes no sense from an evolutionary perspective. It would be tempting to attribute this to Collins’s adherence to the notion of theistic evolution, but in fact one can find this sort of fuzzy foresight argument being brought up by lots of authors. I suppose it’s just disappointing that there is not better communication between genome biology and evolutionary biology.

The case that frustrates me most is that of John Mattick. He of the worst figure ever is one of the primary promulgators of the view that scientists have overlooked possible function for non-coding DNA and that this is “one of the biggest mistakes in the history of molecular biology” that can only be corrected by a “new paradigm”, and so on. Basically, the argument seems to be that much of the non-coding portion of a given genome is involved in regulation and such. In the past, Mattick has refrained from pinning down an estimate of how much non-coding DNA he believes is functional, but his presentation of (extremely selective) data left little doubt that he considers more non-coding DNA to be correlated with greater complexity. But now we’re starting to get some more explicit and increasingly bold claims.

As Check (2007) pointed out in a news article in Nature,

Mattick thinks scientists are vastly underestimating how much of the genome is functional. He and Birney have placed a bet on the question. Mattick thinks at least 20% of possible functional elements in our genome will eventually be proven useful. Birney thinks fewer are functional.

Now consider this quote by Comings (1972), who was the first person to use the term “junk DNA” extensively (even before Ohno’s (1972) coinage appeared in print):

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

So, even if Mattick is right about 20% of the human genome being functional, which is considered a rather high estimate on the basis of available data, he still would be merely agreeing with the author of the first major discussion about junk DNA.

Now, I should point out that I do not have a vested interest in how much of the human genome is functional. 5%? Fine. 20%? Fine. 50%? Ok. I will go where the data indicate. My reason for rejecting the notion of “more complexity means more DNA” is comparative: I refer you to the “onion test” for a simple illustration. However, as readers of Genomicron already know, I find it rather irksome when people take any new finding about (potential) function in some part of the human genome and extrapolate this to mean that all DNA in every genome must be serving some role.

Anyway, back to what Mattick suggests. As noted, for the most part he has gone about arguing for large-scale function more by hint than by direct claim. However, finally he says the following (Phaesant and Mattick 2007).

Thus, although admittedly on the basis of as yet limited evidence, it is quite plausible that many, if not the majority, of the expressed transcripts are functional and that a major component of genomic information is rapidly evolving regulatory DNA and RNA. Consequently, it is possible that much if not most of the human genome may be functional. This possibility cannot be ruled out on the available evidence, either from conservation analysis or from genetic studies, but does challenge current conceptions of the extent of functionality of the human genome and the nature of the genetic programming of humans and other complex organisms. [Emphasis added]

It seems to me that “we can’t rule this out” is not a reason to think that something is plausible, let alone true. In fact, the existence of mechanisms such as transposable element spread and the pseudogenization of duplicate genes suggests that there is good reason to expect much (probably most) of the genome to be non-functional unless data show otherwise. Some TEs have taken on a function, some cause disease, some are merely benign or only slightly detrimental. The proportions of non-coding elements in each of these categories remain to be determined, but they are not all equally likely by default.

The question of which sequences are functional, and in what way, is one of the more contentious and therefore interesting ones in genome biology. On the one hand, new information from various sources including the ENCODE project indicates that much non-coding is transcribed, though it remains an open question whether this has to do with function or noise. On the other hand, a recent analysis has suggested that as many as 4,000 sequences within the human genome initially thought to be genes are not really genes after all (Clamp et al. 2007), bringing the total count down to around 20,000.

Some people, mostly creationists and strict adaptationists (strange bedfellows, I agree) desperately want the vast non-coding majority of eukaryote DNA to have a function. They latch onto any new discovery of function in some segment of the genome or another (or indeed, any mere restatement of what many authors have been saying since the 1970s) and consider their position supported. The rest of us will just have to wait and see.



Check, E. (2007). Genome project turns up evolutionary surprises. Nature 447: 760-761.

Clamp, M., B. Fry, M. Kamal, X. Xie, J. Cuff, M.F. Lin, M. Kellis, K. Lindblad-Toh, and E.S. Lander (2007). Distinguishing protein-coding and noncoding genes in the human genome. Proceedings of the National Academy of Sciences USA 104: 19428-19433.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Phaesant, M. and J.S. Mattick (2007). Raising the estimate of functional human sequences. Genome Research 17: 1245-1253.

4 thoughts on “Is most of the human genome functional?

  1. “”Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated””

    I’ve been having a discussion on a facebook group with ID proponent Mike Gene (of fame, and the author of “A Consilience of Clues”) about this very subject. Mike stated that IDists should “acknowledge that irrational design counts against the ID hypothesis” because “we should not expect irrational deign from life’s designer”.

    So I asked him “What do we do about the ~19,000 pseudogenes in the human genome?”
    To which he responded:

    “It is best explained by evolution. Their lack of any function is inconsistent with design.”

    And speaking to non-coding DNA/functionality, but ignoring ID, I find that this topic generates confusion for people who are pro-ev/anti-ID. Take this link for example:

    In the section “Dead code, bloat, comments (‘junk dna’)”, the author states that 97% of the human genome is composed of introns which are spliced out, and that the remaining 3% are the exons. Later, still talking about non-coding DNA, the author says the existence of this non-coding DNA could be explained by its impact on folding propensity. Again the author’s confusion that all non-coding DNA is intronic causes him to slip up here as well.

    Larry Moran would cry if he read the section about the central dogma of molecular biology, incidentally.

  2. Well, irrational aspects of design may be arguments against a divine designer, but not against design itself, as many human-designed objects have suboptimal or irrational aspects. I do not take suboptimality alone as evidence for evolution, but rather suboptimal characters that are best explained by historical processes. I am still waiting for an unambiguous accounting for why intelligent design proponents assume that design must be perfect if they do not and can not determine the identity, motives, or method of any designer.

  3. Yes, I see what you mean. It seems quite arbitrary for IDists to say “we should not expect irrational deign from life’s designer.”

    In Mike’s case, he suggest front loaded evolution which he later claimed in our discussion “does not entail the non-existence of junk DNA.”

    In other words, all the “non-rational” design is clearly due to blind, mechanistic processes. The more rational the design, the more it casts doubts on the mechanistic prowess of ateleological mechanisms (mutation, selection, drift, cooption etc.)

    Yeah, not very persuasive to me either.

  4. “Why intelligent design proponents, who purport to take a strictly scientific view of the question, also assume that non-coding DNA cannot be non-functional remains unstated.”

    Certainly “unstated” in any scientific sense, but then what in ID has been “stated” in rigorous scientific terms?

    There have been plenty of statements from ID proponents in typical innuendo-and-out-the-other style. Chris alluded to the “front loading” argument that was beloved for some time on UD (and may be still – I’ve tired of reading the site). Chris has also noted the ID argument that anything that’s truly non-functional is evidence of the exclusively destructive results of mutation, leaving the functional stuff for the Designer. These two arguments are at least to some extent contradictory (is anything currently non-functional a Designer’s preparation for the future or truly junk that results from destructive processes?). If one indulges sloppy argumentation, that “explains” virtually any proportion of functional vs. non-functional DNA.

    As for why ID proponents would assume there is no non-functional DNA, or at least assume a need to explain away non-functional DNA in terms that allow a supremely competent Designer, we all know the Designer’s identity is the Mother of All ID Innuendoes.

Comments are closed.