Genome size, code bloat, and proof-by-analogy — a response.

Some of you may remember the post from Dec. 1, 2007, on Genome size, code bloat, and proof-by-analogy (which was posted on DNA and Diversity also). This post referred to a computer study published in the online, non-peer-reviewed arXiv database by Feverati and Musso (2007). Recently, Dr. Musso has been kind enough to provide some responses to my post, though of course very few people will notice because they are located within the comments section of a post that is more than two months old. So, I reprint them here in full, with my responses interspersed throughout.

As an author of the article discussed in this blog I would like to reply to Prof. Gregory criticisms. First of all I think that after writing such an harsh comment on an article, it would be a matter of good taste to inform the authors just to give them the opportunity to reply (that does not cost a great effort since email addresses are in the paper). I stumbled in this review by chance and only recently, and so my answer comes a bit late.

Fair enough (and my apologies if this caused frustration), though it was not my intent to enter into a discussion about the paper, only to post my thoughts and move on. In particular, I had been asked by a reporter for my thoughts about this paper — in the context of understanding genome size — and instead of sending an email I decided to post them.

Even if Prof. Gregory introduces our article saying that: “the authors ….decided that a computer model could provide substantive information about how genomes evolve in nature”, actually we never said that. We have a brief subsection in the conclusions (less than half a page long) where we comment on the biological relevance of our results. Such subsection begins with the following words: “In this section we put forward some biological speculations inspired by our model”. It seems to me that “biological speculations” is quite different from “substantive information”; moreover we speak only of possible advantages in terms of “evolvability”, and that’s also very
different from saying “how genomes evolve in nature”.

Allow me to insert the abstract of the paper:

The development of a large non-coding fraction in eukaryotic DNA and the phenomenon of the code-bloat in the field of evolutionary computations show a striking similarity. This seems to suggest that (in the presence of mechanisms of code growth) the evolution of a complex code can’t be attained without maintaining a large inactive fraction. To test this hypothesis we performed computer simulations of an evolutionary toy model for Turing machines, studying the relations among fitness and coding/non-coding ratio while varying mutation and code growth rates. The results suggest that, in our model, having a large reservoir of non-coding states constitutes a great (long term) evolutionary advantage.

Furthermore, the first two paragraphs of the paper, and the last two (about 1/4 or more of the entire discussion and conclusion), are about genome size, and I believe that one could be forgiven for interpreting this as indicating that the authors saw a strong connection between their study and genome size evolution.

Prof. Gregory next discusses the validity of our assumptions. First of all I would like to notice that since we wrote:”For the sake of simplicity, we imposed various restrictions on our model that can be relinquished to make the model more realistic from a biological point of view”, it means that we are fully aware that our assumptions are NOT realistic. So I can’t understand what’s the point in putting such emphasis in explaining the reasons why they are not. A much briefer comment would have been: “as the authors candidly admit, their assumptions are unrealistic”.

I am glad we are in agreement that the assumptions are unrealistic. The reason I emphasized this so strongly is that this is a blog about genomes and evolution that is meant to provide information to readers with a diversity of educational backgrounds. Dr. Musso and I may know that these assumptions are very unrealistic, but many readers would not. More than a critique of this paper, I was providing details about how evolution actually operates in nature. Incidentally, these criticisms regarding the unrealistic assumptions are the same ones I would have made had I been reviewing this article for a peer-reviewed journal — at least, if any connection was attempted between this model and genome size in eukaryotes.

I would like to stress that a “model” is a simplified version of reality, while a “toy model” is oversimplified to the point that the model is just a caricature of the reality. Still toy models are precious instruments in the investigation of complex systems, and can give some hints and help comprehension on the modelized phenomenon. First example that comes to my mind is the “HPP lattice gas model” for hydrodynamics. Imposing the level of detail requested by prof. Gregory would result not in a toy model and neither in a model but in an accurate description of reality (admitting that by now we have a perfect understanding of all biological phenomena). Moreover with such level of detail it would have been impossible to reach our aim (measuring the optimal coding/non-coding ratio in our model), partly for the computational time required and partly for the impossibility to interpret unambiguously the results obtained.

I think we are in agreement on this, though my conclusion is that if a model has too be too simple to reflect reality then it is not useful, whereas Dr. Musso seems to be saying that because only simplified models can be used, they are justified. The notion that biological evolution is similar to hydrodynamics, and indeed this view of models generally, is the reason for my original post. I noted in the original post that their model may have been the greatest of its sort ever developed, but that it has no bearing on biological evolution — if we agree that it is unrealistic, then why begin and end a paper with a multi-paragraph discussion of a biological phenomenon?

I would like to stress that since, in our model, adding a new state has a NEUTRAL impact on the fitness, the process of state-increasing is, by definition, NON-adaptative. I agree with prof. Gregory that it would have been better to use “mimic Darwinian evolution” instead of “mimic biological evolution”, but I have also a provocative question: was Darwin’s theory to be rejected as a theory of biological evolution since he did not specify the exact mechanisms of mutation?

As a matter of fact, Darwin’s theory of natural selection (but not the fact of evolution) was not widely accepted in his own time in part because he lacked a basis for inheritance, and it was largely rejected in the early 1900s, in part because new knowledge about heredity (namely the rediscovery of Mendelian inheritance) seemed to contradict its assumptions. In any case, I don’t really see what the relevance of this well known history is to the discussion of these models.

In its conclusion prof. Gregory suggests that we claim that “Non-coding DNA does accumulate “so that” it will result in longer-term evolutionary advantage”. We ABSOLUTELY NEVER stated such a non-sense.

If I may quote once more from the article:

In this section we put forward some biological speculations inspired by our model. There are two way [sic] of identifying TMs [Turing machines] with biological entities and they suggest two ways up to which the accumulation of non-coding free to mutate DNA can play a role for “evolvability”. In the first one we identify TMs with organisms and coding-states with genes. We have to stress that the mechanism of transcription is different in the two contexts. For TMs transcription is serial, so that states must be transcribed, one at a time, in prescribed order, while in biological organisms transcription of genes can happen in parallel. We can interpret TMs states as genes accomplishing both a structural and regulatory function, since a coding state both affects the output tape and specifies which state has to be successively transcribed. From this point of view, we can think of TMs in our simulations as organisms trying to increase their gene pools adding new genes assembled from junk DNA. If the organisms possess more junk DNA it is possible to test more “potential genes” until a good one is found.

I may have misinterpreted what the authors meant by this, but it seems to imply that junk DNA serves as a reservoir of potential genes and that this increases evolvability. The implication drawn by many authors, including some biologists like Collins, is that this is why junk is there (“It is not the sort of clutter that you get rid of without consequences because you might need it. Evolution may need it,” [Collins] said.”). Either way, this served as a useful launching pad to reiterate the important point that this makes no sense evolutionarily if framed as a cause of junk DNA rather than as a potential consequence.

It is curious that the same accuse was moved by prof. Gregory in its article “Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma”, that we cite in our paper, to an article by Jain that we also cite in our paper. So, either prof. Gregory has a very poor opinion of our intelligence, or he thinks that we do not read the articles that we cite.

I reject the dichotomy presented there. Some other possibilities, inter alia, are that the authors did not interpret the papers the same way as I did, or they read mine by disagreed with my argument, or they partially misunderstand how evolution occurs. Given that even some biologists who work on real-life genomes make this mistake, I hardly think this implies a lack of intelligence, only a lack of background.

Let us state, unambiguously, what we and Jain really say: “IF does exist a mechanism for genome size increase, THEN maybe the resulting long-term advantage can overcome the short-term disadvantage” (Jain was referring to the selfish dna as the genome increasing mechanism while we do not give any preference). Prof. Gregory reverts the implication: “IF there is a long-term advantage THEN the mechanism of genome increase is the product of selection”, and then explains us that it can’t be true. Incidentally, in the case of Jain, I think that what he was really intending can be clearly understood just by the title: “Incidental DNA”.

“Long-term advantage” and “short-term disadvantage” imply selection, and there does not seem to be much difference between the two ways of stating this. Moreover, as I noted in my original post and in more detail in an earlier post, long-term inter-lineage selection can potentially overcome short-term disadvantage, but this is not why non-coding DNA exists in the first place. If Dr. Musso and others understand it that way, then so much the better. But many people do not, and so taking an opportunity to clarify the issue once more was worthwhile.

Finally, let us state, very very briefly what in our paper we really did. We built up an abstract evolutionary model with mechanisms of mutation and genome increase, in such a way that we could exactly measure what is, in our model, the coding/non-coding ratio, and we found that it can’t be more than 2%. We were thinking that such result could be interesting also for biologists, maybe we were wrong.

Once again, this strongly indicates that Dr. Musso sees his “evolutionary model with mechanisms of mutation and genome increase” as a way of studying real biological genome size evolution, which was the entire reason for the post in the first place.

Biologists may indeed have an interest — I suggest that the paper be submitted to a peer-reviewed biological journal.

One thought on “Genome size, code bloat, and proof-by-analogy — a response.”

Fabio Musso on February 12, 2008 at 4:17 am said:

I would like to thank prof. Gregory for its very kind answer. I have only a few comments to add.
Prof. Gregory says:

“….. my conclusion is that if a model has too be too
simple to reflect reality then it is not useful, whereas Dr. Musso seems to be saying that
because only simplified models can be used, they are justified.”

I have nothing to object to Prof. Gregory . Let me say that, from my point of view, one is perfectly free to believe or not that the results of our model can have any relevance from the biological point of view. Indeed, we cannot give any definitive proof in this sense.

Later Prof. Gregory says:
“The notion that biological evolution is similar to hydrodynamics, and indeed this view of models generally, is the reason for my original post.”

My point was not to compare hydrodynamics and evolution. I was encouraging readers to look at how
realistic HPP lattice gas is as an hydrodynamics model. I have no difficulty in saying that biological evolution is much more complex of any physical phenomenon I ever encountered.

Prof. Gregory cite a phrase from our article:

“From this point of view, we can think of TMs in our simulations as organisms trying to
increase their gene pools adding new genes assembled from junk DNA. If the organisms possess
more junk DNA it is possible to test more “potential genes” until a good one is found.”

Here I really have to apologize to Prof. Gregory. I admit that the natural interpretation
of this unfortunate sentence is the one that he gave on his first post. I think that we have
absolutely to change this sentence and I thank Prof. Gregory for drawing our attention on it.

Last comment by Prof. Gregory reads:

“Once again, this strongly indicates that Dr. Musso sees his “evolutionary model with mechanisms
of mutation and genome increase” as a way of studying real biological genome size evolution,
which was the entire reason for the post in the first place.”

We don’t want to use our model for “studying real biological genome size evolution”, what we are
interested in is: given the actual genome size evolution that lead to the presence of a lot of
junk DNA in the eukaryotes genomes, can we (even if only by analogy and very loosely) quantify
the longer term advantage associated to it?

Finally I must say that when I wrote my first answer I was quite enraged, so I want to apologize
if I indulged in polemic.

Comments are closed.