There is some buzz at the moment about the latest cool-but-kinda-controversial output of Craig Venter’s research group. Specifically, they assembled a synthetic Mycoplasma genitalium genome from scratch. Hidden inside the artificial DNA sequence were sequences of codons that had been arranged to spell out messages once the initials for the amino acids that they specify were deciphered. (See here for more about the genetic code and amino acid initials — 20 letters are represented, but there is no B, J, O, U, X, or Z). The short messages were:
As Wired reports,
Three of the five watermarks have obvious referents as authors from the original paper (Craig Venter, Hamilton Smith, John Glass, Clyde Hutchison). In fact, the only mystery left is the reference to “Cindi” but that could be a reference to Cindi Pfannkoch, who was (is?) in the employ of Hamilton Smith, according to this New Yorker article.
If you’re wondering about the “v” in “Institvte” that’s because no amino acid correlates to the letter U, so the Venter Institute (undeterred, as always) reached back to medieval times, when orthography was less settled and U and V were interchangeable.
But what about natural genomes? Carl Zimmer has suggested a little game of word search using the BLAST engine at NCBI. Carl thinks it’s a game, but I know this has real scientific and philosophical importance: in fact, I am prepared to state that it provides the absolute best evidence ever proposed for design in a genome.
Here is what happened when I searched for DESIGN in the protein database. I have not altered the output of the query in any way. Be warned, the result is staggering.
1 pisgiysnme dltigyqhfs kyrhylkwvp kyiahklgkv rdykelfylv npemlcllav
61 neklnynfrq ntgslmtlfk dvnyysdldt demvsfysal gihssmhmrs lsyfilnirn
121 eyylrlynip aylsdinvsn nfpffnyikn npickhvpdh nlgqfisfvn eiinydqkpk
181 pyipnryvyk npklshfvlp tnmsdktytp havigsgrtn lllytydvyr nvsrkqased
241 nvltsdvlfe yegdplifyn wlsyigdqnd mkrrnfmqki ylysdninin vvdnlinafs
301 tthytkffif dknhpvdahk hlhrtlnnfs ipiqivsfsv gnkkkitfpi lntpkidrde
361 aiayeyinry tnflqnhvir nsfytttdhn yilthktfkg yqqkavdrlr dqikvvknfi
421 nshktfnemk kalrdsfnih gtapintdny inhelgdles fveenypnpi gldegvsndd
481 ssqydlsyyd nyngtyllvn sdlklrsvyk ymlkyskiyk ntkyiefvmk nemrgdvhdq
541 lvnvengssc lfdfndnirv syiidycnyd kksyflfyke ykskniysvp sqdlcesaey
601 sylklcqnms llkkfftktl dtqlseihkd emkrmtkikn aiednidfkn ilsisndslv
661 siihdknegi ttfdinacft vsakltlgni fnvnsqidpe tartninnsi fctpvsvpva
721 vnrpimrsin dvyiraifni mkdqqfreym ripvnsnpyh sfiyffdkya yvykkrkwyk
781 nmnhvkmfip pqtikwnmfy yllrnnsqts ynnemflydf fygkksadik alsrnimkpf
841 lshftlffyl ykvdesign
That’s right. There is DESIGN found in the sequence of amino acids in this “hypothetical protein PC001346.02.0″. And not just anywhere. Right at the end, obvious for everyone to observe.
Also, I should point out that I did not search the complete database. No, this needed to be a strict test. Therefore, only one genus was searched: Plasmodium.