Here’s the abstract of a paper set to be published in Molecular Biology and Evolution. Now, I think this kind of study is interesting and important. But it’s predictable that they start out with the standard (and historically false) claim that “non-coding DNA was long dismissed as junk” (seriously, do reviewers require authors to insert this line or something?). It’s also predictable that the amount of non-coding DNA that they report as showing signs of constraints (about 5% of the genome) will be reported in science news as “junk DNA functional after all!”.
Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes.
Eory L, Halligan DL, Keightley PD
Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here we show that in hominids, a group with historically low effective population sizes, all classes of non-coding DNA evolve more slowly than ancestral transposable elements, and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5′ UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably four-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at four-fold sites are effectively selected against in hominids. The higher constraint at four-fold sites in hominids suggests a more complex protein-coding gene structure than murids, and indicates that methods for detecting selection on protein coding sequences (e.g., using the d(N) /d(S) ratio), with four-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection, and that there are three times as many constrained sites within non-coding sequences as within protein-coding sequences. Including coding and non-coding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.