From non-coding to coding genes.

I sometimes get asked if non-coding elements (usually “junk DNA” is what they say) can ever evolve into genes. I usually say that transposable elements, at least, can be coopted into functional roles, and that it wouldn’t be so odd if a pseudogene took on a novel function sometime through mutations. Kind of a lame answer, I know, but there haven’t been too many unambiguous examples yet, so cut me some slack.

Anyway, here’s a story in New Scientist that describes a report of three genes unique to humans that appear to have arisen from non-coding DNA. I don’t know about other researchers, but I certainly didn’t consider this “virtually impossible” (as New Scientist states), just rare.

Three human genes evolved from junk

I have to give New Scientist credit on this story for not going with the easy, lazy, and incorrect “everyone thought it was junk but now it’s all turning out to have a function!” template. As the author, Michael LePage, writes:

The researchers conclude that three of these non-coding sequences must have mutated in humans and become capable of coding for the short proteins at some point since we diverged from chimps six million years ago. While at least half the non-coding DNA in humans is junk with no function, it is not clear whether the non-coding DNA from which the genes evolved had any function.

Such “de novo” gene evolution was once thought impossible because random mutations are highly unlikely to produce a DNA sequence that encodes a protein of any length, let alone a protein that will be transcribed by cells and do anything useful. But in 2006, several de novo genes were discovered in fruit flies. Since then, it’s become clear that genes do continually evolve in this way.

Part of the explanation might be that biological systems are very noisy: even though most of our DNA is junk, most of it still gets transcribed into RNA at times, and some of that RNA probably reaches cells’ protein-making machinery. This means that when mutations do throw up sequences capable of encoding proteins, some may get “tested” and useful ones selected for. As more primate genome data becomes available, McLysaght estimates a further 15 human genes will turn out to have evolved de novo.

LePage, by the way, also wrote the excellent piece Evolution: 24 Myths and Misconceptions.

The abstract of the forthcoming paper by Knowles and McLysaght (2009):

The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.

2 thoughts on “From non-coding to coding genes.

  1. Glad you found it interesting. As for the virtually impossible bit:

    “The probability that a functional protein would appear de novo by random association of amino acids is practically zero.”

    Francois Jacob, 1977

  2. Well, overall it was a very good piece, as was your list of 24 myths (and the print article), and that was a very minor quibble only. But…

    The statement was "…via mutations in non-coding stretches of DNA, a process thought to be virtually impossible until recently"

    a) Francois Jacob in 1977 is n = 1 from 32 years ago. Is there evidence that most people thought this right up until recently?

    b) Jacob is talking about randomly assembling amino acids. That's not what happened in this case. Basically, the other apes have a few indels that we don't have which makes the sequences non-coding.

    My first guess is that these were genes in the past, inactivated to pseudogenes in a distant ancestor (since they lack introns, presumably processed pseudogenes), then reactivated in humans through relatively minor mutations.

    It's very cool, and your article does a good job. So, again, it was only a very minor point.

Comments are closed.