Richard Roberts

The Genomics of Restriction and Modification

Wednesday, 29 June 2005
15:00 - 17:00 CEST


With more than 200 bacterial and archaeal genomes completely sequenced and the total sequence content of GenBank still growing exponentially, we can now gain some impression of the distribution of Restriction-Modification systems in the real world. This has been accomplished by using computational analysis of these sequences to find genes or remnants of genes that show clear similarity to known restriction systems in REBASE. This approach works well in identifying Type I and III systems, which show good conservation of sequence similarity, but for the Type II systems only the modification genes are easily identified. New R genes show up only as genes lying close to an M gene and themselves having no similarity to any other genes in GenBank.

Surprisingly, these RM systems, or the relics of them, are much more abundant than might have been guessed from the classical biochemical screening of strains in the laboratory. In particular, Type I systems are widely distributed in Nature and many instances of solitary specificity subunits are found. More than 100 potential Type III and Type IV systems are found and on average about 4 DNA methyltransferase genes are found per genome. Apparently solitary M genes, in which the R gene is either missing or non-functional, seem quite common. However, our ability to identify M genes accurately is made difficult by the presence of conserved motifs in genes that methylate molecules other than DNA. Analyses of the many environmental samples now appearing in GenBank suggests that the rate of evolution of both M and R genes is quite high and confirms previous findings that the direct cloning of intact RM systems into E. coli is quite difficult with current technology. Importantly, there is little reason to think that our current collection of more than 250 Type II specificities is more than a small sample of the specificities present in Nature.

Related Laureates