Hamilton Smith

Defining Minimal Life

Thursday, 3 July 2003
09:30 - 10:00 hrs CEST


Viruses, with as few as 3 genes, could be regarded as the simplest form of life. However, viruses are completely dependent on their host cells for energy and protein synthesis. The question thus becomes: How many genes are required to support cellular life? Recent studies suggest that as few as 300 genes may be needed. However, the gene requirements of a minimal cell are relative to the growth conditions, and a unique solution may not exist.
Prokaryotic organisms (archaea and bacteria) have all the necessary genetic and biochemical equipment for independent growth and are a logical starting point for defining minimal life. The first two bacteria to be sequenced, Haemophilus influenzae (1700 genes) and Mycoplasma genitalium (485 protein coding genes) led to the first attempt to define the minimal set of genes for life. Computational analysis (Mushegian and Koonin, PNAS, 93, 10268, 1999) revealed 234 protein-coding genes in common (orthologs) between the two organisms. These 234 genes represented most of the known essential biochemical machinery. An additional 22 nonorthologous displacements (NODs) were added to fill gaps in essential pathways, yielding a minimal set of 256 genes. However, purely computational approaches are not sufficient to predict a minimal gene set with confidence. An experimental approach, transposon mutagenesis, was carried out on M. genitalium and sites of insertion in the genome were located (Hutchison et al, Science 286, 2165, 1999). Transposons disrupt genes and generally inactivate them. A total of 140 genes could be disrupted without loss of viability. A similar study in Mycoplasma pneumoniae, a closely related bacterium, yielded 177 non-essential gene disruptions. The estimated total number of non-essential M. genitalium genes was 129 based on the combined disruption of orthologs (homologous genes between the two species). Since the mutagenesis was not exhaustive, we can estimate the total number of non-essential genes to be between 180 and 215, assuming that the number of sites hit per gene follows a Poisson distribution. A lower estimate for the number of essential genes is then 485 - 215 = 270. The upper estimate is 485 – 129 = 356.
In a recent study, gene knockouts were performed in the larger, more complex bacterium, Bacillus subtilis, (Kobayashi et al. PNAS 100, 4678, 2003). B.subtilis is typical of many larger bacteria that contain several thousand genes and grow in a variety of environments. In these larger bacteria, a majority of the genes are non-essential for laboratory growth. In addition, a sizeable fraction of the genome may consist of accumulated phage, transposon and insertion sequences, as well as islands of specialized, but non-essential genes and collections of less defined non-essential sequences. In the Kobayashi study, genes were considered to be essential if they could not be inactivated by insertion, and if the strains became IPTG-dependent when an intact copy of the gene was placed under control of an IPTG-inducible promoter. All 4,101 annotated B. subtilis genes were studied. Only 271 genes were found to be essential for growth in rich laboratory broth. However, some essential genes were undetected because they were duplicated or because two genes with the same function existed. Since duplicated genes are frequent in B. subtilis, 271 must be considered a minimal estimate, and the true number of essential genes may be significantly larger.
In another study with large bacteria, a genome reduction approach has been taken. Two phylogentically-distant Escherichia coli strains have been sequenced, the K-12 strain (4.639 Mb) and the pathogenic O157:H7 strain (5.528 Mb). Comparison of these two strains reveals 3.7 Mb of 98% identical common “backbone” sequences into which hundreds of strain-specific sequences are inserted. These strain-specific islands are called K-islands and O-islands. Kolisnychenko et al (Genome research, 2002; www.genome.org) used recombination-based precise targeted deletions to remove 12 of the largest islands resulting in a reduced genome (4.263 Mb) with growth characteristics nearly identical to the wild type strain. In a separate study, Jo Yu et al (Nature biotechnology, 20, 1018, 2002) used Cre/loxP Tn5-targeted deletions to remove 0.313 Mb of the E. coli genome. In the future, progressive targeted reduction of genomes may result in minimal organisms and useful new strains.

Related Laureates