Alu taketh but ERV giveth back.

Sometimes I am asked whether a pseudogene can regain function. The answer, according to a paper by Bekpen et al. (2009), is yes. And the mechanism is cool — an Alu insertion knocked it out and an ERV insertion restored its function.

Author summary

The IRG gene family plays an important role in defense against intracellular bacteria, and genome-wide association studies have implicated structural variants of the single-copy human IRGM locus as a risk factor for Crohn’s disease. We reconstruct the evolutionary history of this region among primates and show that the ancestral tandem gene family contracted to a single pseudogene within the ancestral lineage of apes and monkeys. Phylogenetic analyses support a model where the gene has been “dead” for at least 25 million years of human primate evolution but whose ORF became restored in all human and great ape lineages. We suggest that the rebirth or restoration of the gene coincided with the insertion of an endogenous retrovirus, which now serves as the functional promoter driving human gene expression. We suggest that either the gene is not functional in humans or this represents one of the first documented examples of gene death and rebirth.

This story has already been reported by others, so I will just post links:

The death and resurrection of IRGM – the “Jesus gene” (Not Exactly Rocket Science)

First ‘resurrected’ gene found in humans (New Scientist)

The resurrection of a disease-linked gene (Nature News)

A Curious Case of Genetic Resurrection (ScienceNOW)

(Please don’t get into the whole “aha — scientists shoulnd’t have dismissed Alu and ERVs as junk” trap — see the quotes of interest series).

Quotes of interest — Alu again.

I discussed the early papers involving the discovery of Alu elements in a previous post in the series. Unlike some transposable elements that are capable of autonomous transposition, Alu elements do not encode the requisite enzymes and depend on those of other sequences such as LINE-1 elements. Alu is restricted to primates, and its origin seems to have been a duplication and reverse transcription of a 7SL RNA gene early in primate evolution. One in ten nucleotides in each human genome is part of an Alu sequence, of which there are more than 1 million copies.

The elucidation of the evolutionary origins of Alu elements came some time after their initial discovery in 1979. Initially, it was thought that the 7SL RNA gene was derived from Alu, but the reverse conclusion was given by Ullu and Tschudi (1984) and was discussed further by authors such as Quentin (1992). As noted, the original papers reporting the existence of Alu elements raised the question about their potential functions. However, the later articles arrived right in the middle of the supposed time when non-coding DNA was dismissed as irrelevant. Once again, the actual literature from the period does not support the notion that such a dismissal ever actually occurred.

Ullu and Tschudi (1984) did not discuss possible function explicitly, but they did note that “these 7SL-specific homologies may reflect a strong functional constraint acting on these sequences.” In an accompanying article in the same issue of Nature, Brown (1984) was more specific about the significance of the results. He stated,

Ullu and Tschudi suggest that Alu sequences represent defective 7SL RNA molecules that have been reverse-transcribed into DNA and inserted into the genome. An analogous origin has been suggested for alpha-globin pseudogenes in the mouse, and the multiple pseudogenes for small nuclear RNAs in man. Pseudogenes are generally thought not to play an important role in the cell. Perhaps those who have argued that Alu, by its very abundance, must have an important function will recognize that this argument has now lost some of its weight.

Two important things are expressed here. One, the assumption that Alu elements are functional because they are abundant (i.e., an adaptationist expectation that they would have been removed otherwise) was apparently common in the early 1980s. Indeed, that’s why the “selfish DNA” idea was proposed (Orgel and Crick 1980; Doolittle and Sapienza 1980). Two, pseudogenes — defunct coding genes — were indeed thought to be non-functional, for obvious reasons. These are the sequences to which the term “junk DNA” originally related.

Additional information regarding the origin of Alu sequences was provided by Quentin (1992), who said,

from the beginning, the Alu progenitor sequences could have retained the capacity to interact with cellular components, suggesting that they are functionally important for the host genome. On the other hand, this RNA secondary structure could have some affinity for reverse transcriptases or other components of the retroposition machinery, and its conservation in the monomeric and Alu dimeric sequences could be related to their mobility. Indeed, this structure is first found in the 7SL RNA sequences that are prone to retroposition, and it is also retained by the progenitor sequences of the Bl family in the rodent genomes. Nevertheless, both hypotheses (secondary structure involved in a cellular function or in the reverse transcription) are not mutually exclusive.

Yet, here is a fairly typical introduction from a recent paper about Alu (Hasler and Strub 2006):

Alu elements, as well as other repetitive elements, were at the origin considered as parasites of the genome that had no major effect on its stability and genic expression. They were thought to be ‘selfish’ or ‘junk’DNA (6,7), but nowadays, several lines of evidence show that the presence of repetitive elements and especially of Alu elements, had a great influence on the human genome, in particular on its evolution. These effects were both negative and positive. On one hand, integration into genic regions that caused gene inactivation might often have been deleterious for the organism. On the other hand, because of their extended sequence homology, Alu elements induced a considerable number of non-allelic recombinations that lead to both duplications and deletions of DNA segments, thereby accelerating evolution by several orders of magnitude. Another function frequently attributed to Alu elements is their ability to provide new regulatory elements to neighboring genes. It was, indeed, reported several times that Alu elements became effectors of gene transcription by providing new enhancers, promoters and polyadenylation signals to many genes.

The only authors cited for the “Alu is just junk” are Orgel and Crick (1980) and Orgel et al. (1980). I have discussed these articles before, but will reiterate one statement from each.

Orgel and Crick (1980):

It would be surprising if the host genome did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another.

Orgel et al. (1980):

In our recent experience most people will agree, after discussion, that ignorant DNA, parasitic DNA, symbiotic DNA (that is, parasitic DNA which has become useful to the organism) and ‘dead’ DNA of one sort or another are all likely to be present in the chromosomes of higher organisms. Where people differ is in their estimates of the relative amounts. We feel that this can only be decided by experiment.

______

Part of the Quotes of interest series.
______

References

Brown, A.L. 1984. On the origin of the Alu family of repeated sequences. Nature 312: 106.

Hasler, J. and K. Strub. 2006. Alu elements as regulators of gene expression. Nucl. Acids Res. 34: 5491-5497.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.

Quentin, Y. 1992. Origin of the Alu family: a family of Alu-like monomers gave birth to the left and right arms of the Alu elements. Nucl. Acids Res. 20: 3397-3401.

Ullu, E. and C. Tschudi. 1984. Alu sequences are processed 7SL RNA genes. Nature 312: 171-172.

Crankishness.

As a brief follow up to the post about Dr. Andras Pellionisz’s Google seminar, I cannot help but quote from his website:

Since a US Government-mandated (and taxpayer paid) 4-year study (ENCODE, led by Dr. Collins) established the scientific fact that (at the least a significant part of) formerly “written off” so-called “non-coding DNA” is massively involved in genome function, US government-supported professionals who after the release of ENCODE (reversal of the Establishment in 2007) neglect to to follow Dr. Collins’ mandate that “the scientific community will have to re-think long-held beliefs” are actually liable for a hefty Class Action Lawsuit for Negligence when they disregard the established reversal of protocol and e.g. continue to “write off” investigation of 98.7% of the DNA in cases of grave genomic syndromes.

I really try to be fair with people like this, but woah. Can researchers be charged with negligence for what they don’t study? (Assuming, that is, that there actually was a total dismissal of non-coding DNA, which there wasn’t).

Pellionisz Google Tech presentation.

Those of you who read this blog or others that discuss non-coding DNA will, for better or worse, be familiar with regular commenter Andras Pellionisz. Many people have concluded that Dr. Pellionisz is essentially a “crank”, though I believe I have tried to give him a fair hearing on this blog (before asking him to stop repeating the same arguments over and over). Whether he has managed to convince anyone of his view that all non-coding DNA is functional is another issue, however. Readers should judge this for themselves. Thus, here are links to his website, a recent article, and a recent Google Tech presentation.

www.junkdna.com (home of the “avant-garde society that formally abandoned ‘junk DNA'”)

Pellionisz, A. 2008. The principle of recursive genome function. Cerebellum 7: 348-359.

A few more quotes about non-coding DNA.

Just for fun, here are some quotes I came across while reading a few sources for a paper I am writing.

Remember, a significant number of creationists, science writers, and molecular biologists want us to believe that non-coding DNA was totally ignored after the term “junk DNA” was published in 1972, that the authors of the “junk DNA” and “selfish DNA” papers denied any possible functions for non-coding elements, and, in the case of creationists, that “Darwinism” is to blame for this oversight. The latter of these is nonsensical as the very ideas of “junk DNA” and “selfish DNA” were postulated as antidotes to excessive adaptationist expectations based on too strong a focus on Darwinian natural selection at the organism level.

For those of you who didn’t read the earlier series, see if you can guess when these statements were made.

(A)

There is a strong and widely held belief that all organisms are perfect and that everything within them is there for a function. Believers ascribe to the Darwinian natural selection process a fastidious prescience that it cannot possibly have and some go so far as to think that patently useless features of existing organisms are there as investments for the future.

I have especially encountered this belief in the context of the much larger quantity of DNA in the genomes of humans and other mammals than in the genomes of other species.

Even today, long after the discovery of repetitive sequences and introns, pointing out that 25% of our genome consists of millions of copies of one boring sequence, fails to move audiences. They are all convinced by the argument that if this DNA were totally useless, natural selection would already have removed it. Consequently, it must have a function that still remains to be discovered. Some think that it could even be there for evolution in the future — that is, to allow the creation of new genes. As this was done in the past, they argue, why not in the future?

(B)

A survey of previous literature reveals two emerging traditions of argument, both based on the selectionist assumption that repetitive DNA must be good for something if so much of it exists. One tradition … holds that repeated copies are conventional adaptations, selected for an immediate role in regulation (by bringing previously isolated parts of the genome into new and favorable combinations, for example, when repeated copies disperse among several chromosomes). We do not doubt that conventional adaptation explains the preservation of much repeated DNA in this manner.

But many molecular evolutionists now strongly suspect that direct adaptation cannot explain the existence of all repetitive DNA: there is simply too much of it. The second tradition therefore holds that repetitive DNA must exist because evolution needs it so badly for a flexible future–as in the favored argument that “unemployed,” redundant copies are free to alter because their necessary product is still being generated by the original copy.

(C)

These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded. There are several possible functions for junk DNA.

(D)

There is a hierarchy of types of explanations we use in efforts to rationalize, in neo-darwinian terms, DNA sequences which do not code for protein. Untranslated messenger RNA sequences which precede, follow or interrupt protein-coding sequences are often assigned a phenotypic role in regulating messenger RNA maturation, transport or translation. Portions of transcripts discarded in processing are considered to be required for processing. Non-transcribed DNA, and in particular repetitive sequences, are thought of as regulatory or somehow essential to chromosome structure or pairing. When all attempts to assign a given sequence or class of DNA functions of immediate phenotypic benefit to the organism fail, we resort to evolutionary explanations. The DNA is there because it facilitates genetic rearrangements which increase evolutionary versatility (and hence long-term phenotypic benefit), or because it is a repository from which new functional sequences can be recruited or, at worst, because it is the yet-to-be eliminated by-product of past chromosomal rearrangements of evolutionary significance.

(E)

This is what I emphasized earlier, that this DNA must have a functional value since nothing is known so widespread and universal in nature that has proven useless.

(F)

I’ve stopped using the term [‘junk’] …Think about it the way you think about stuff you keep in your basement. Stuff you might need some time. Go down, rummage around, pull it out if you might need it.

Answers to be provided in the comments.

Quotes of interest — satellite DNA in the news.

I have already made note of some of the coverage of noncoding DNA that appeared in Science during the 1980s, and as a sequel to that earlier installment of the series, I want to talk about the coverage in Nature from the late 1960s and early 1970s. Because SINEs, LINEs, pseudogenes, and introns were all discovered in 1977 or later, this will necessarily focus on satellite DNAs.

As mentioned previously, satellite DNAs were discovered in the early 1960s, and by the late 1960s and early 1970s there was substantial interest in these highly repetitive components of the genome. Nature published several stories about this work in their “News and Views” section, authored by various unnamed correspondents. Of course, one must not take the interpretation of anonymous science writers as definitive (after all, their contemporary counterparts do much to add the the mythology surrounding noncoding DNA), but it supports the overall contention that during this period adaptationist thinking was dominant and thus that it was taken almost as a given that functions would be elucidated for noncoding sequences. You will notice also that many of these stories report on data that contradict proposed functions, yet the expectation remained that some function exists. I am not criticizing the studies of these early authors in any way. Some satellite DNA is functional in chromosomal structure, but the point is that at the time this was an a priori assumption rather than a conclusion, and it is clearly not the case that these elements were dismissed as unimportant.

“Mouse satellite DNA”, Nature 215: 575, August 5, 1967:

What is the function of satellite DNA? It is unlikely to code for protein and yet it forms 10 per cent of the cell’s total DNA. What possible purpose is served by having so many, apparently identical, short sequences within the same genome?

“Satellite DNA”, Nature 222: 327, April 26, 1969:

Unfortunately, the group’s latest data serve only to make ideas about the function of this strange DNA fraction even more obscure.

But if satellite DNA is not transcribed, what is its function? Flamm et al. are impressed by the fact that numerous copies of a nucleotide sequence of 350 bases have been maintained during evolution in the face of the tendency to accumulate random mutations. This implies that satellite DNA has some important function. They suggest that it is required for “housekeeping”, the folding and packing of DNA in the chromosomes. In the absence of any critical data or any way of testing for the function of this DNA, that is as good a suggestion as any.

Like the Edinburgh group, Maio and Schildkraut are convinced that satellite DNA has some vital, albeit unknown, function.

“Hybridization and satellite DNA”, Nature 225: 414, January 31, 1970:

The function of this satellite DNA has always been obscure, reducing investigators to suggest, for example, that it may be involved in chromosome “housekeeping”, but Pardue and Gall claim that it is localized in the centromeres. It may therefore play a role in chromosome pairing, and this may account for the curious properties of satellite DNA, not least its peculiar base sequence.

“Mysterious satellites”, Nature 225: 899-900, March 7, 1970:

Any biologist told that 10 to 12 percent of the total DNA genome of an animal is sequestered in a chemically distinct fraction would find it hard to escape the conclusion that such DNA has some crucial cellular function. That explains why the so-called satellite DNAs are exciting so much interest…

A host of experiments and speculations leap to mind. Perhaps satellite DNA plays some part in the assembly of the mitotic spindle, for example, by influencing polymerization of spindle protein or the attachment of chromosomes to the spindle. Hybrid cells might be useful in studying the specificity of a putative interaction between satellite DNA and components of the mitotic spindle. And the chromosomes of organisms with diffuse centromeres might be useful for further testing the relationship between satellite DNA and centromeres.

The last one is interesting, because it led to a correction by one of the first people to identify satellite DNA, Waclaw Szybalski, in the correspondence segment of the April 4, 1970 issue. Did he complain about the characterization of biologists anticipating functions for satellite DNA? No, he simply noted that the author got the date of discovery wrong (Szybalski 1970; Nature 226: 89-90).

“Satellite DNA and sequence”, Nature 227: 775, August 22, 1970:

What possible function can be served by a DNA which consists of tandem duplication of a sequence of only six base pairs, and why should an animal such as the guinea-pig require some 107 copies of this short sequence in all its cells?

Finally, even though we now know the basic sequence unit of a satellite DNA we are no closer to explaining the function of these specialized DNAs. Since they have no role in coding protein, the most plausible suggestion is that they have some role in maintaining the integrity of the chromosome itself. The localization of satellite DNA in the centromere regions of chromosomes suggests they play a part in the functions conventionally ascribed to the centromere. But for the time being such suggestions remain speculative.

“Satellite DNA and speciation”, Nature 240: 128, November 17, 1972:

The function and evolutionary significance of satellite DNA — DNA which has a reiterated base sequence, is associated with heterochromatin and centromeres and may or may not be transcribed — remain tantalizing mysteries. It seems unlikely that these simple sequences code for any polypeptides and it has, therefore, been suggested that satellite DNA may be involved in processes such as pairing of homologous chromosomes, chromosome movement and chromosome packing, but there is little evidence in support of these speculations.

“The mystery deepens”, Nature 240: 255, December 1, 1972:

But the fact remains that one is still at a loss as to the function of satellite DNA, the chief characteristic of which is its comparatively simple and highly reiterated base sequence, and indeed the more that is learnt about the distribution of satellite DNAs the deeper the enigma of its function becomes.

“DNA dominant at Berkeley, California”, Nature 245: 183-184, September 28, 1973:

The problem of DNA redundancy continues to intrigue several teams, without finally yielding all the secrets of its function or the reason for the wide variation in amount from species to species. Some of the extra DNA is almost certainly present as spacer sequences between cistrons, but this does not account for the large amount of simple sequence DNA, present in millions of copies, in the centromeric heterochromatin. P.M.B. Walker, for whom the Medical Research Council has recently set up a unit in Edinburgh specifically devoted to research on the mammalian genome, reviewed the history of satellite DNA, but said that most investigators would still go no further than suggest that this material, which is not transcribed, has some “housekeeping” function.

Here is the take-home message. From the time it was discovered, satellite DNA was presumed to be functional on the basis of Darwinian adaptationist expectations. This stimulated intensive research on the subject which was considered interesting enough to be reported about regularly in Nature. Some of the proposed functions, such as a structural role for some noncoding sequences, turned out to be correct — which was only shown because the Darwinian assumption prompted researchers to test functional ideas. The claim by creationists that “Darwinism” prevented such research is manifestly and demonstrably inaccurate. The problem, as I have noted, is that a strict focus on adaptive roles for noncoding DNA prevented many researchers from adopting a more balanced approach under which some of it is functional but most of it is not.

____________

Part of the Quotes of interest series.


Quotes of interest — SINEs and LINEs.

I am hopeful that our exploration of the peer-reviewed scientific literature and related news stories in scientific journals from the 1960s to the 1990s convincingly reveals that those who claim that junk DNA was “long dismissed as irrelevant” have it exactly backwards. Throughout this period, but especially before the non-adaptationist (though not exclusive) alternative offered by the selfish DNA hypothesis began to influence thinking on the topic by the mid-1980s, it was assumed, following Darwinian logic, that the very existence of so much DNA meant that it must be functional for the organism. It is only after considerable empirical investigation of potential functions that it became a common view that most (but certainly not all) noncoding DNA is unlikely to be functional at the level of the organismal phenotype.

I have already mentioned Alu elements — by far the most common single type of noncoding DNA element in the human genome. Alu elements are part of the category of repetitive DNA known as SINEs, which stands for short interspersed repeated sequences (or short interspersed nuclear elements). These sequences are now recognized as a type of transposable element that uses an RNA intermediate (i.e., undergoes retrotransposition) but which cannot do so without borrowing (some say parasitizing) the molecular transposition apparatus of other elements, namely long interspersed repeated sequences (LINEs). LINEs are not as common in the human genome as SINEs, but as they are much larger, they make up more of the total DNA. Whereas there are about 1.5 million SINEs (1 million of them Alu) making up about 13% of the genome sequence, the 870,000 or so copies of LINE elements (more than 500,000 of them LINE-1) constitute more than 20% of human DNA.

The terms SINE and LINE were coined by Maxine Singer in 1982 (Singer 1982a). By that time, the term “junk DNA” (Ohno 1972; Comings 1972) had been in circulation for a decade, and this was also two years after the “selfish DNA” hypothesis was put forward by Orgel and Crick (1980) and Doolittle and Sapienza (1980). Singer (1982b) cited these latter papers (but not Ohno’s) in her longer review of mammalian repeated DNA sequences. So once again, we have a prime candidate for assessing the general attitude in the scientific community regarding possible function of noncoding DNA sequences during the supposed period of neglect.

Were SINEs and LINEs dismissed as mere junk unworthy of further exploration?

Singer (1982a):

Function?
The critical question about SINEs and LINEs concerns their function, if they have any. The catalog of proposed functions for SINEs includes many of the unsolved problems in molecular biology, but none has been demonstrated directly. The existence of RNA transcripts from some SINE-family members is the most compelling argument available that they have a function, although functions independent of transcription (and in addition to transposition) have also been suggested. (The possibility that LINEs are transcribed requires investigation). Particularly striking is the fact that the 4.5S transcripts of Alu-like SINEs of hamster and mice are more than 95% identical in sequence, which is significantly closer than the variation among the different copies of a SINE family in a single species. If we assume that one or a few SINEs encode the 4.5S RNAs, is there any functional significance to the many other dispersed copies of family members? It seems reasonable to expect that there is some trade-off between an advantage imparted to cells by SINEs and the disadvantage of a promiscuous and abundant mobile element that is presumably destructive if implanted in an essential coding region.

Singer (1982b):
[A number of in-line citations have been omitted for clarity]

Are SINES functional?
As a background, it is interesting to recall proposals suggesting that highly repeated dispersed sequences may be without function (Orgel and Crick 1980; Doolittle and Sapienza 1980) and also disagreement concerning those proposals (Cavalier-Smith 1980; Dover 1980; T.F. Smith 1980; Orgel et al. 1980; Dover and Doolittle 1980). Specific functions that have been suggested include the control of gene expression, perhaps by involvement of transcripts of SINES in the maturation of messenger RNA, and service as origins of DNA replication.

The following additional point may be important, in view of the suggestions that highly repeated sequences have no function at all. A mobile element may generate diversity with a potential selective advantage, but it can also generate disadvantage if it moves into an essential gene. Mutation by movable elements has been demonstrated in yeast and Drosophila. The high frequency of mutation caused by the presence of large numbers of movable elements within a mammalian genome might have proven intolerable and been selected against, unless it was counterbalanced by some positive functional advantage.

Finally, the suggestion that SINES may serve as origins for DNA replication should be considered. The basis for the suggestion is the presence in SINES of a short (14bp) homology to a sequence associated with the origin of replication of murine and primate popaviruses. Georgiev et al. (1981) describe some preliminary experiments that are consistent with this suggestion. However, in popavirus genomes this region is part of a complex control region and may be involved in the control of transcription as well as replication. Only additional experiments will resolve these questions.

Are LINES functional?
The discovery of LINE families in mammals is recent and there is very little information available regarding function. Adams et al. (1980) found no transcripts homologous to the human Kpn-LINE family in bone marrow cells and Manuelidis [1982] also reports negative preliminary experiments. There is no information available regarding the possibility that LINES are mobile in mammalian genomes.

As noted previously, the SINE Alu was first described in 1979, and the first LINEs were discovered using similar methods around 1980. Singer (1982b) cites several publications and articles in press detailing sequences of this type from the human and mouse genomes. Most of these papers did not include any discussion one way or the other about function and focused instead on the technique used or the specific molecular characteristics of the sequences. However, one of the early papers did discuss function (and non-function).

Adams et al. (1980):

As to the function or genesis of this sequence we can make only vague hypotheses. The fact that it is not expressed into RNA, at least in bone marrow cells, at levels proportionate to its reiteration frequency, suggests that it does not code for a protein or major nuclear RNA in this tissue. However, there may be a low-level transcript which has some functional role, or there may be transcription in some other tissue. Alternatively this sequence may be a binding site for a chromosomal protein, or serve as a signal for chromosomal folding. As such it could conceivably have some role in the regulation of expression of the β-globin or other nearby genes. The interspersion of this sequence among other DNA is consistant with but not by itself supportive of such a role. Finally it is possible that this repeated sequence has no function relevant to the organism, but is carried in the genome in an essentially parasitic fashion (Doolittle and Sapienza 1980).

____________

Part of the Quotes of interest series.
____________

References cited

Adams, J.W., R.E. Kaufman, P.J. Kretschmer, M. Harrison, and A.W. Nienhuis. 1980. A family of long reiterated DNA sequences, one copy of which is next to the human beta globin gene. Nucleic Acids Research 8: 6113-6128.

Cavalier-Smith, T. 1980. How selfish is DNA? Nature 285: 617-618.

Comings, D.E. 1972. The structure and function of chromatin. Advances in Human Genetics 3: 237-431.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Dover, G. and W.F. Doolittle. 1980. Modes of genome evolution. Nature 288: 646-647.

Georgiev, G.P., Y.V. Ilyin, V.G. Chmeliauskaite, A.P. Ryskov, D.A. Kramerov, K. G. Skryabin, A. S. Krayev, E. M. Lukanidin, and M. S. Grigoryan. 1981. Mobile dispersed genetic elements and other middle repetitive DNA sequences in the genomes of Drosophila and mouse: transcription and biological significance. Cold Spring Harbor Symposia on Quantitative Biology 45: 641-654.

Manuelidis, L. 1982. Repeated DNA sequences and nuclear structure. In Genome Evolution (eds. G. Dover and A. Flavell), pp. 263-285. Academic Press, New York.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Orgel, L.E., F.H.C. Crick, and C. Sapienza. 1980. Selfish DNA. Nature 288: 645-646.

Singer, M.F. 1982a. SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes. Cell 28: 433-434.

Singer, M.F. 1982b. Highly repeated sequences in mammalian genomes. International Review of Cytology 76: 67-112.

Smith, T.F. 1980. Occam’s razor. Nature 285: 620.


Quotes of interest — beware single citations and non-citations.

As readers who have been following the Quotes of interest series will know, I have been arguing that from the discovery of repetitive DNA until at least the mid-1980s, the general expectation was that it must somehow be functional for the organism. By 1989 or 1990, we start to see claims that noncoding sequences were “long dismissed” as mere “junk” or “parasites”, and that biologists finally are beginning to recognize that it is interesting. We have heard this line, more or less unaltered, ever since. This statement is usually not backed up with any reference to the literature, it is more of a “general sense” sort of claim. Everybody knows junk DNA was dismissed as irrelevant, so using the standard statement to that effect need not be supported by any actual citations. Either that, or authors who make the statement will cite Ohno (1972), and possibly also Doolittle and Sapienza (1980) and Orgel and Crick (1980), even though these original publications did not in any way dismiss possible functions for some noncoding elements.

This trend of picking one or two references to cite (if any) in support of the claim that all of biology neglected the possible functions of noncoding DNA goes back nearly two decades. I have already provided this quote in a previous post, but I want to use it as an example of how single citations (not to mention non-citations) can be extraordinarily misleading about the past state of the field when it comes to this issue. In 1990, Willard argued that,

Although it has been recognized for over 20 years that the centromeric heterochromatin in chromosomes from virtually all complex eukaryotic organisms consists of various families of satellite DNA, they have only recently been taken seriously as candidates for something other than ‘junk’ DNA or genomic ‘flotsam and jetsam’ (Miklos 1985).

I think it is obvious from the summary of the literature on satellite DNA that the claim that its possible role in chromosomal structure was neglected is incorrect. But what of the citation of Miklos (1985)? Did he represent at least a minority of biologists who long dismissed satellite DNA as nothing more than “junk” or “flotsam and jetsam”?

As it turns out, I have already cited a few of Miklos’s publications from the 1970s and 1980s. Here are some highlights:

Yamamoto and Miklos (1978):

The most important aspect of satellite DNA remains the nature of its functions. Although a large body of data has been gathered concerning its structure, distribution and properties in several different organisms, most of these results have in fact neither supported nor disproved any one of the particular hypotheses of function. The most popular hypothesis on satellite DNA function has been, and still is, that satellite DNA is involved in some aspect of chromosome mechanics such as chromosome pairing.

John and Miklos (1979):

Thus the large amount of satellite DNA in some species, its apparent rigid conservation in sequence and, as we shall see, its effects on the genome when it is altered in amount or position lead us to be unimpressed in general with the argument that most of it constitutes a functionless burden which many eukaryotes must bear. However, for the moment we will retain an open mind and examine the hard data pertaining to function before casting a final judgment at the end of this article.

So, it would appear that Miklos had an interest in the possible functions of satellite DNA, and he certainly took the possibility seriously that they could be important for the chromosome. By 1985, his view had changed to a large extent on this, but this was due to his evaluation of the data, not a long-held assumption about nonfunction.

As he wrote in 1985,

It is this blatant profligacy of repeated sequences within some eukaryotic genomes that fascinates many investigators, and has led to two general and mutually exclusive conclusions concerning their status in the genome. One is that these sequences have functions; the other is that they represent the natural outcomes of genomic turnover processes that involve selfish, ignorant, or junk DNA. The functionalists rationalize the existence of these sequences have been held to function, for example, in determining centromere strength, chromosome pairing, recombination, the three-dimensional architecture of the nucleus, genomic reorganization, and speciation.

On the other hand the proponents of the second class of hypotheses propose that these sequences are inevtiable byproducts of molecular mechanisms involved in DNA replication and recombination, and that such genomic turnover is not supervised by natural selection acting through the phenotype.

These viewpoints accurately reflect other deep divisions in contemporary biology, such as those concerning selection versus neutrality and function versus junk. An overt unease is now apparent in studies on highly repetitive DNAs, since, despite focusing the awesome power of recombinant DNA technology onto the structure of these sequences, the expected biological roles have still proved elusive.

The whole problem of whether there was a function at all for highly repetitive sequences exploded into the harsh glare of criticism when the concept of selfish DNA was propounded. Doolittle and Sapienza (1980), presenting arguments based on the properties of mobile elements, and Orgel and Crick (1980), using similar arguments pertinent to highly repetitive DNAs, argued that sequences having little effect on the phenotype could be spread throughout a genome simply on the basis of their preferential replicative properties.

I have spelled out these classical hypotheses because they can be compared with the themes of the 1970s and 1980s, which are expressed in terms of DNA sequences. Thus satellite or highly repetitive DNAs have been invoked in chromosome pairing, chromosomal rearrangements, speciation, germ line processes, alteration of regulatory pathways via genomic reorganization, and the three-dimensional structure of the interphase nucleus. Highly repetitive DNA sequences may, alternatively, be thought of as examples of selfish DNA or ignorant DNA. Thus highly repetitive DNAs are considered to be subject to natural selection, to be neutral, or to be subject to molecular drive.

This litany of functional explanations led Doolittle and Sapienza (1980) and Orgel and Crick (1980) to formulate their concept of selfishness, in which no phenotypic or evolutionary function was called for.

After this introduction, Miklos (1985) spent 58 pages reviewing the available data on highly repetitive DNA sequences. In the end, it is clear that two major developments influenced Miklos’s views on satellite DNA: 1) despite considerable effort to generate supporting data for proposed functions, these remained lacking in his view, and 2) there was now an alternative, non-adaptive explanation that could account for the existence of much noncoding DNA.

Miklos (1985) further discussed the significance of the new, non-adaptationist, selfish DNA hypothesis near the end of his chapter.

Doolittle and Sapienza (1980), Doolittle (1982), and Orgel and Crick (1980) have drawn attention to what has bogged down our evolutionary concepts of the genome. We have been wedded to thinking of it in functional terms and have considered natural selection as an omnipotent reaper that would rapidly slash any nonproductive DNA from the genome. The existence of highly repetitive sequences has inevtiably led to a search for their function, and this in turn has revolved around a role in chromosome mechanics or chromosome architecture. The data, however, do not support either of these theories. Consequently, evolutionary avenues have been explored. Thus, highly repetitive sequences have been invoked in facilitating chromosome rearrangements, but once again the data base does not favor such excursions. Similarly, a role has been sought for them in speciation processes, but again no unbroken chain of data leads to this as a likely possibility. Doolittle and Sapienza (1980), Doolittle (1982), stressed that most “explanations” are unrealistic. If there are DNA sequences without effects on phenotype, then they can arise, spread, and be maintained in a functional vacuum. The data indicate that satellite sequences per se have no known phenotypic effects.

Miklos was not one to simply dismiss noncoding DNA as “junk” without a thought. If anything, he was among the most ardent advocates of the need for data to test functional hypotheses about noncoding DNA, and he contributed a significant number of these data himself. His view that most noncoding DNA is non-functional for organisms came only after careful evaluation of the data. As he also noted in 1985,

The cloned sequence data together with the biological data are impressive in their implications; they have negated every functional test to which satellite DNA has been put in a mechanistic cellular domain and they have seriously compromised many of the evolutionary concepts pertaining to this class of DNA. In this field there has been an almost overpowering reluctance to lay aside hypotheses that were erected in the absence of relevant data. There has been a desire to avoid the unpalatable, or the unthinkable — that tandem arrays of highly repetitive sequences have no functional characteristics whatsoever but are the ignorant playthings of genomic turnover processes. There is a fear that if not one [adaptive] hypothesis is left standing, then little of substance will remain in this field. However, highly repetitive DNAs have provided a vanguard for the problems that must be encountered in interfacing molecular biology with developmental and evolutionary biology at any level. These sequences have revealed many of the hurdles likely to be encountered in both coding and noncoding parts of the genome. These problems will not go away, nor will our understanding of genomic relevance be helped if the hypotheses that are erected are of little direct predictive value.

I believe that an acceptable state of affairs has been reached for highly repetitive localized arrays at the cellular level. Most hypotheses have been dispensed with, and we are confronted with the stark realism that the obsessive hunt for function has been tangential and illusory.

Revising one’s view in the face of new theoretical explanations and accumulated data (and indeed, producing some of the data oneself) is exactly how science is supposed to work.

____________

Part of the Quotes of interest series.
____________

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

John, B. and G.L.G. Miklos. 1979. Functional aspects of satellite DNA and heterochromatin. International Review of Cytology 58: 1-114.

Miklos, G.L.G. 1985. Localized highly repetitive DNA sequences in vertebrate and invertebrate genomes. In Molecular Evolutionary Genetics (ed. R.J. MacIntyre), pp. 241-321. Plenum Press, New York.

Ohno, S. 1972. So much “junk” DNA in our genome. In Evolution of Genetic Systems (ed. H.H. Smith), pp. 366-370. Gordon and Breach, New York.

Orgel, L.E. and F.H.C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Willard, H.F. 1990. Centromeres of mammalian chromosomes. Trends in Genetics 6: 410-416.

Yamamoto, M. and G.L.G. Miklos. 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98.


Quotes of interest — 1970s edition (part one).

I have argued that prior to 1980, when the selfish DNA hypothesis was proposed, it was taken more or less as a given by most biologists that noncoding DNA had some function(s), even if the specific adaptive significance of these sequences had yet to be demonstrated. This was based on simple Darwinian, adaptationist logic: if it’s there, it must be doing something good for the organism or else natural selection would have done away with it a long time ago.

The point of the selfish DNA hypothesis was to provide an alternative view that did not preclude the potential function of some noncoding DNA, but also did not require it all to be adaptive for its presence to be accounted for. In reality, this simply shifted the process of natural selection down one level, from organisms in populations to sequences of DNA within the genome, but the consequence was important because it meant that evidence had to be provided for organism-level function — this could not just be assumed simply because noncoding DNA exists.

I have already quoted from many papers published during the 1980s, which showed that many biologists continued to explore possible functions for noncoding DNA of different kinds during the 1980s, as they continue to do today. In this post, I want to back up the claim that a common assumption during the 1970s was that most DNA in the genome is functional for the organism. By the late 1970s we start to see a shift away from this position and toward (untenable) ideas regarding “evolutionary” functions — but that is a topic for another time. Here, then, are some samples of the peer-reviewed literature from the period from 1970 to 1979.

At present, one can only speculate on the functions of much of the repeated DNA in eukaryotes. For example, repetitious DNA might contain binding sites for chromosomal proteins or chromosomal RNA… A certain portion of the repetitious DNA may play a role in determining how DNA fibers fold in the chromosome… A fraction of repetitious DNA may not contain any genetic “information” at all but may have evolved to function merely as space-filler in the genome.
[Note: this quote is heavily abbreviated from more than one paragraph, so please consult the original for the full text].

We can summarize these speculations about the genetic functions of repetitious DNA by concluding that, whatever they are, they are probably diverse. Regardless of the function of repetitious DNA, however, its very existence raises fundamental problems for [adaptationist] evolutionary theory.

Edelman, G.M. and J.A. Gally. 1970. Arrangement and evolution of eukaryotic genes. In The Neurosciences Second Study Program (ed. F.O. Schmitt), pp. 962-972. Rockefeller University Press, New York.

It was first demonstrated by Boivin and the Vendrelys that mammalian cells from different tissues contain the same amount of DNA, and furthermore that sperm cells contain half this amount. In several species of fungi vegetative diploid cells contain twice the amount of DNA contained by haploid ones. It is also well known that DNA is metabolically stable. These observations strongly suggest that most, if not all, the DNA has a genetic function. The difficulty arises when the actual amounts of DNA per nucleus in different species are considered.

The problem of large amounts of DNA per genome might be explained if only a proportion of the DNA had a genetic function and the rest was redundant or played some other role in the cell.

In summary, there appear to be several possible functions for DNA which does not encode for messenger RNA and protein synthesis. Direct information about such functions is negligible and one can only guess at the total amount of DNA which may be involved.

Holliday, R. 1970. The organization of DNA in eukaryotic chromosomes. Symposia of the Society for General Microbiology 20: 359-379.

Alternative hypotheses concerning the function of repetitious DNA have recently been proposed, and it seems an appropriate time to review the evidence for the existence of repetitious DNA and see how it adds to our understanding of the evolution and organization of DNA within organisms.

None of the recognition functions, i.e., recognition of centromeres, initiation sites, pairing sites, recombination sites, folding sites, or regulatory sites, that we have discussed is mutually exclusive of the others. They all relate to cellular phenomena that have been demonstrated or inferred from other data. All these phenomena probably exist within every higher organism. Therefore, DNA involved in each of these functions could contribute in varying degrees to the repeated portion of the genome.

Bostock, C. 1971. Repetitious DNA. Advances in Cell Biology 2: 153-223.

There is no doubt, however, that the presence of satellite DNA confers some selective advantage; otherwise we cannot account for its spread throughout a population as large as that of the house mouse.


The origin and spread through the population of such diverse sequences therefore present considerable problems, and I would like to suggest that the best explanation might be that satellite DNA confers a direct advantage on the chromosome which carries it, because such a chromosome survives the mechanical processes of, primarily, meiosis better than a sister chromosome not so well endowed.

Walker, P.M.B. 1971. Origin of satellite DNA. Nature 229: 306-308.

With the assumption that a portion that comprises some 10 percent of the genomes in higher organisms cannot be without a raison d’être, an extensive review led us to conclude that a certain amount of constitutive heterochromatin is essential in multicellular organisms at two levels of organization, chromosomal and nuclear. At the chromosomal level, constitutive heterochromatin is present around vital areas within the chromosomes. Around the centromeres, for example, heterochromatin is believed to confer protection and strength to the centromeric chromatin. Around secondary constrictions, heterochromatic blocks may ensure against evolutionary change of ribosomal cistrons by decreasing the frequency of crossing-over in these cistrons in meiosis and absorbing the effects of mutagenic agents. During meiosis heterochromatin may aid in the initial alignment of chromosomes prior to synapsis and may facilitate speciation by allowing chromosomal rearrangement and providing, through the species specificity of its DNA, barriers against cross-fertilization.

At the nuclear level of organization, constitutive heterochromatin may help maintain the proper spatial relationships necessary for the efficient operation of the cell through the stages of mitosis and meiosis. In the unicellular procaryotes, the presence of a small amount of genetic information in one chromosome obviates the need for constitutive heterochromatin and a nuclear membrane. At higher levels of organization, with an increase in the size of the genome and with evolution of cellular and sexual differentiation, the need for compartmentalization and structural components in the nucleus became imminent. The portion of the genome that was concerned with synthesis of ribosomal RNA was enlarged and localized in specific chromosomes, and the centromere became part of each chromosome when the mitotic spindle was developed in evolution. Concomitant with these changes in the genome, repetitive sequences in the form of constitutive heterochromatin appeared, probably as a result of large-scale duplication. The repetitive DNA’s were kept through natural selection because of their importance in preserving these vital regions and in maintaining the structural and functional integrity of the nucleus.

The association of satellite (or highly repetitive) DNA with constitutive heterochromatin is understandable, since it stresses the importance of the structural rather than transcriptional roles of these entities. Nuclear satellite DNA’s have one property in common despite their species specificity, namely heterochromatization. In this sense the apparent species specificity of satellite DNA may be the result of natural selection for duplicated short polynucleotide segments that are nontranscriptional and can be utilized in specific structural roles.

Yunis, J.J. and W.G. Yasmineh. 1971. Heterochromatin, satellite DNA, and cell function. Science 174: 1200-1209.

The great variability between species (and even populations) in the type and quantity of satellite and/or cryptic sequence DNA suggests that organisms function within wide limits for organizing the type and extent of constitutive heterochromatin. On the other hand, the striking species specificity of major satellite sequences, together with a rather low level of base sequence divergence within a species and the precise localization of much of this DNA to centric heterochromatin, suggest that these DNA’s perform important nuclear functions.

Rae, P.M.M. 1972. The distribution of repetitive DNA sequences in chromosomes. Advances in Cell and Molecular Biology 2: 109-149.

Today there are over 100 laboratories studying different aspects of the problem concerning highly repetitive sequences and, as is always the case, a new and often confusing jargon has grown up around the rapidly evolving field. It is the purpose of this chapter to define and describe these new terms and to focus on the following three questions: What are highly repetitive sequences? How did they evolve? What is their function?

Flamm, W.G. 1972. Highly repetitive sequences of DNA in chromosomes. International Review of Cytology 32: 1-51.

[Satellite DNAs in mice] are widely established in populations throughout the world and thus clearly cannot be relegated to a random and functionless quirk of evolution. Simple-sequence DNAs must obviously possess a certain role in the genome.

There is one major hope for making sense of the fact that many higher organisms seem to carry in every nucleus a large portion of their DNA that looks, superficially, to be completely worthless. This lies in the comparative approach. When do simple-sequence DNAs arise in evolution? Can we find two closely related species one with and one without a major block of heterochromatin?

Swift, H. 1973. The organization of genetic material in eukaryotes: progress and prospects. Cold Spring Harbot Symposia on Quantitative Biology 38: 963-979.

Alternatively, large variations in genome size could readily be accommodated if a high proportion of the DNA were used for functions other than coding for proteins. A number of such functions have been proposed and incorporated into hypothetical structures for the eukaryotic genome.

The outstanding problem presented by eukaryotic DNA is that of finding a role for these large fractions not used in coding for proteins or cytoplasmic RNAs.

Southern, E. 1974. Eukaryotic DNA. In MTP International Review of Science, Biochemistry Series One, Volume 6, Biochemistry of Nucleic Acids (ed. K. Burton), pp. 101-139. University Park Press, Baltimore.

Genetic redundancy is common in eukaryotes and thus must confer substantial selective advantages upon this group of organisms. One function of certain redundant DNA sequences (satellite DNAs) that are located in the centromeric heterochromatin may be to facilitate proper chromosome pairing and segregation.

Tartof, K.D. 1975. Redundant genes. Annual Review of Genetics 9: 355-385.

Proposed functions for satellite DNA were evaluated and formally set forth by Walker (1971) and have since been expanded by Mazrimas and Hatch (1972), Lagowski et al. (1973), Lee (1975), Bostock (1971), Walker (1972), and Comings (1972). In a masterful summary and evaluation of current ideas relating repeated DNA to the organization of the eukaryote chromosome (Cold Spring Harbour Symposia on Quantitative Biology 1973) Swift stated that the function of simple sequence DNAs not only appeared to have most investigators mystified, but that the present theories concerning their function were not accepted with much enthusiasm. He did, however, point out that “There is one major hope for making sense of the fact that many higher organisms seem to carry in every nucleus a large portion of their DNA that looks superficially to be completely worthless. This lies in the comparative approach. When do simple sequence DNAs arise in evolution? Can we find two closely related species one with and one without a major block of heterochromatin?”

Miklos, G.L.G. and R.N. Nankivell. 1976. Telomeric satellite DNA functions in regulating recombination. Chromosoma 56: 143-167.

Satellites constitute from 1% to 65% of the total DNA of numerous organisms, including that of animals, plants, and prokaryotes. Their existence has been known for about 15 years, but, although it is thought that they must be biologically important, with few exceptions … their functions are still largely in the realm of speculation. This remains true despite their ubiquity and, except for polytenized tissues, their constancy as a fraction of the total DNA in all tissues of the particular animal or plant species in which they are observed.

Skinner, D.M. 1977. Satellite DNA’s. BioScience 27: 790-796.

The most important aspect of satellite DNA remains the nature of its functions. Although a large body of data has been gathered concerning its structure, distribution and properties in several different organisms, most of these results have in fact neither supported nor disproved any one of the particular hypotheses of function (see Comings, 1972; Swift, 1973; Hsu, 1975; Miklos and Nankivell, 1976; for evaluations of functions). The most popular hypothesis on satellite DNA function has been, and still is, that satellite DNA is involved in some aspect of chromosome mechanics such as chromosome pairing.

Yamamoto, M. and G.L.G. Miklos. 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98.

In spite of the large amount of information which now exists on the structure of satellite DNA, it is clear that the central issue, namely, function, has not been directly tackled. Probably the most important reason for this unsatisfactory state of affairs has been the signal failure to approach the problem of function experimentally, despite the considerable effort that has gone toward elucidating structural properties. In part this refractory state of affairs stems from the assumption that a knowledge of function necessarily follows from a knowledge of structure. In part too it is explained by the fact that the properties of satellite DNA have been evaluated within the framework of prokaryotic dogma without sufficient consideration of the higher-order phenomena which characterize the biology of eukaryotes.

It appears very obvious that we have now reached a stage in satellite DNA research where additional structural analyses are not revealing the nature of its function — and indeed there is a very good reason for this. The initial success of the prokaryotic approach to genetic function was due to its manipulative aspects. This approach, involving perturbation of a system by mutation, deletion, substitution and translocation, proved critical. Only recently has a similar approach been applied specifically in investigating satellite DNA function, although an enormous literature exits on experimental and natural modifications of heterochromatin, which bear directly on this issue.

In the absence of experimental evidence the problem has in general been discussed in terms largely modified from earlier theoretical considerations relating to the functions of heterochromatin. A summary of the comparisons of heterochromatin and satellite DNA functions is presented in Table II. As can be seen from this table, the assumption has generally been made that there is at least one positive function. However, since similar organisms have widely different amounts of satellite DNA, and since such differences are found even between species that form viable hybrids, some investigators have suggested that these sequences are simply evolutionary by-products with no particular function. This fails to explain why so many eukaryotes have been found to contain highly repeated DNA and why its amount varies so considerably even between closely related species. Equally difficult to explain is why in some cases mechanisms have evolved to regulate replication of this DNA in particular tissues independently of the rest of the genome or indeed of other repetitious sequences in the same nucleus. Thus the large amount of satellite DNA in some species, its apparent rigid conservation in sequence and, as we shall see, its effects on the genome when it is altered in amount or position lead us to be unimpressed in general with the argument that most of it constitutes a functionless burden which many eukaryotes must bear. However, for the moment we will retain an open mind and examine the hard data pertaining to function before casting a final judgment at the end of this article.

Click for larger image

John, B. and G.L.G. Miklos. 1979. Functional aspects of satellite DNA and heterochromatin. International Review of Cytology 58: 1-114.

____________

Part of the Quotes of interest series.


Quotes of interest — Alu.

Whereas each copy of the human genome contains about 20,000 protein-coding genes, it is also home to more than 1 million copies of a short interspersed repetitive element (SINE) known as Alu. For this reason, Doolittle (1997), perhaps only half jokingly, suggested that the genomes of humans “might be ironically viewed as vehicles for the replication of Alu sequences”.

Alu elements are now known to be transposable elements and are restricted to primate genomes, though neither of these facts was recognized until several years after they were first discovered. They are not capable of autonomous movement and replication in the genome, rather they are “parasites” of other elements like LINE-1 which encode their own means of transposition. Their origin seems to trace to a duplication of a 7SL RNA gene near the origin of the primates. Today, some Alus are implicated in genomic functions while others continue to cause disease — as a general group of sequences, some are parasitic, some are mutualistic, and many are probably commensal, neither conferring benefit nor doing harm. For an excellent overview the biology of Alu elements, see Batzer and Deininger (2002).

The history that is most commonly recounted when it comes to the study of genomic components like pseudogenes and transposable elements is that they were long dismissed as irrelevant “junk”. As I noted with reference to pseudogenes, they were not interpreted this way when discovered, even though this happened 5 years after the idea of “junk DNA” was proposed (Jacq et al. 1977).

Alu elements were first isolated in 1979, and are so named because this involved digestion of genomic DNA with the AluI restriction enzyme (which in turn is named for the bacterium from which it is derived, Arthrobacter luteus) (Houck et al. 1979). Again, if the typical story about noncoding DNA is true, then we should expect the discovery of these elements to have been discussed in terms of their biological insignificance.

Here is what Houck et al. (1979) actually said about the newly discovered elements:

Renatured DNA from human and many other eukaryotes is known to contain 300-nucleotide duplex regions from renatured repeated sequences. These short repeated DNA sequences are widely believed to be interspersed with single copy DNA sequences. In this work we show that at least half of these 300-nucleotide duplexes share a cleavage site for the restriction enzyme AluI. This site is located 170 nucleotides from one end. This Alu family of repeated sequences makes up at least 3% of the genome and is present in several hundred thousand copies.

Since these 300-nucleotide sequences, as well as their interspersed unique sequences, occupy such a large fraction of the genome in widely divergent eukaryotes, one imagines that they serve some important biological function. Among other possibilities, it has been proposed that they are involved in gene regulation. Unfortunately, their function remains unproven. In deciding what biological function these repeated sequences might serve, it is important to know the number of different families to which they belong.

It has been proposed that the 300-nucleotide interspersed repeated sequences perform a regulatory function either at the DNA or RNA level. The inclusion of over half of these 300-nucleotide sequences in a single family of repetitive sequences (the Alu family) would limit their ability to function as complex regulatory elements.

We have found in this work that at least half of 300-nucleotide inverted repeated DNA sequences and half of all other 300-nucleotide repeated sequences belong to one family. Comparing our independent results on inverted repeated DNA sequences, it seems likely that the heterogeneous nuclear RNA duplexes studied by Jalinek are transcribed from the Alu family of repeated sequences. We are currently testing this hypothesis by RNA-DNA hybridization and DNA sequencing. This hypothesis suggests that the function of the Alu family occurs at the level of the heterogeneous nuclear RNA. It has been proposed that such repeated sequences might be processing sites for heterogeneous nuclear RNA. Although other possibilities cannot be ruled out at this time, we find this to be an especially attractive proposal for the function of a single simple class of repeated sequences that are so widely distributed throughout the genome.

In a second paper published in the following year, Rubin et al. (1980) said:

The biological function of this family of sequences is unknown. We and our colleagues have recently noted sequence similarities between a selected portion of the Alu family and several other RNA or DNA sequences, which are known or suspected to be involved in DNA replication, transcription control, and mRNA processing. Together these observations reinforce our belief that a family of DNA sequences which includes 300,000 highly conserved members interspersed throughout much of the mammalian genome, must have an important function.

____________

Part of the Quotes of interest series.
____________

Batzer, M.A. and P.L. Deininger. 2002. Alu repeats and human genomic diversity. Nature Reviews Genetics 3: 370-380.

Doolittle, W.F. 1997. Why we still need basic research. Annals of the Royal College of Physicians and Surgeons of Canada 30: 76-80.

Houck, C.M., F.P. Rinehart, and C.W. Schmid. 1979. A ubiquitous family of repeated DNA sequences in the human genome. Journal of Molecular Biology 132: 289-306.

Jacq, C., J.R. Miller, and G.G. Brownlee. 1977. A pseudogene structure in 5S DNA of Xenopus laevis. Cell 12: 109-120.

Rubin, C.M., C.M. Houck, P.L. Deininger, T. Friedmann, and C.W. Schmid. 1980. Partial nucleotide sequence of the 300-nucleotide interspersed repeated human DNA sequences. Nature 284: 372-374.