Bookmark and Share

Alex Bäcker's Wiki / The Informational Content of a Genome
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

The Informational Content of a Genome

Page history last edited by Alex Backer, Ph.D. 17 years, 3 months ago

The Informational Content of a Genome or other State in an Evolutionary Process

 

The Information in a Genome increases with Evolutionary Time

A few years ago, I heard a talk at Caltech in which the speaker contrasted the amount of information in a human brain with that in the genome, and poked fun at how little information the genome actually carries. More recently, Ray Kurzweil estimated the informational content of the human genome at only 12 MB --although it is estimated to be about 3 billion base pairs long and to contain 20,000-25,000 distinct genes. Yet even if these estimates correctly quantify the information needed to copy the genome, they do not correctly capture the information that the genome contains about the world. How is this so?

 

How much information does a genome carry?

 

In an evolutionary process, the state prevailing after a series of rounds of selection contains information not only about the state that survived, but also about the states that did not. In particular, a genome, or any state in an evolutionary process, expresses that its fitness is the maximum of all genomes sampled during its evolution. Thus, the informational content of a genome, or indeed of any state in an evolutionary process, is in continuous growth for as long as new states continue to be explored.

 

Since the genome does not carry the genomes that were previously tried but did not work as well, the only information it truly carries about its cousins which reached evolutionary dead-ends is that it is the best among them. How much information is that, exactly? In bits, it is the log in base 2 of the number of different states surveyed during the evolutionary process.

 

How much is this for the human genome? The Population Reference Bureau estimates that from 50,000 BCE (when homo sapiens first appears) through 1995, it is likely that more than 100 billion human beings have been born. This makes for 36.5 bits during Homo sapiens evolution. Of course, the human genome is the result of a longer process of evolution, one spanning not just the time since the appearance of Homo sapiens but the evolution of all life.

 

How Many Genomes Have Existed?

How many genomes of any kind have existed since the beginning of life? I have not found any calculation of this number, so I give a rough estimate below. The number of genomes sampled per unit time is inversely proportional to the generational gap and directly proportional to the duration of the evolutionary process. Thus, bacteria, which have a short replication time and have been around for a very long time, have explored an immensely larger set of genomes than humans have. This makes bacteria a good place to start with our estimate. The # of genomes G is given by:

 

G = N x T / g x U

 

where N is the average number of individuals existing at any one time,

T is the length of time evolution has been going,

g is the average generation time, and

U is the probability that an individual is unique, i.e. that no other individual with the same genotype has existed.

 

In a study published in PNAS in 1998, a team of researchers from the University of Georgia, led by microbiologist William B. Whitman, estimated the number of bacteria on Earth to be five million trillion trillion: that's a five with 30 zeroes after it. Another way to go about this is to start with estimates of the biomass: the entire earth is estimated to contain about 75 billion tons of biomass. The mass of a single cell of the E coli bacterium is 665 femtograms, or 665x10^-15 g. So if the entire biomass was made of E coli, that would yield 10^28 individuals, which is close enough to the other estimate so as to make both plausible and suggest that the total biomass is dominated by bacteria. This conclusion is consistent with a previous estimate by Stephen Jay Gould(1). With this in mind, our estimate using bacteria becomes an even better proxy for all of life, after considering that not only have bacteria been around the longest and have the shortest generation times, but they are also more abundant than any other lifeform. While assuming that the Earth has always had this population is probably an overestimate, it is probably the right order of magnitude, as it has probably been a very long time since life has conquered most nooks and crannies of the Earth.

 

Life on Earth is theorized to have evolved from non-life sometime between 3.9 to 3.5 billion years ago, making T 3x10^13 hs.

 

Generation times for bacterial species growing in nature may be as short as 15 minutes or as long as several days, so g ranges between 0.25 hs and, say, 48 hs.

 

U depends on the dimensionality of the space being explored, on the evolutionary process, on the mutation rate, and on G. If the mutation is high enough that every individual is different from its parents, we can approximate U as G/(4^L), where L is the length of the genome in base pairs, which, for humans, is close to 1. Nachman and Crowell (2000) estimated that the average mutation rate was estimated to be ~2.5 x 10-8 mutations per nucleotide site or 175 mutations per diploid genome per generation in humans, so this hypothesis is true for humans, and even if all mutations originated from the same wildtype (which they don't), 3 billion choose 175 is a staggeringly large number, making each individual unique.

 

But for bacteria, mutation rates are about 1 every 300 chromosome replications, making 299 individuals out of every 300 identical to its parent. This gives an upper bound for U of 1/300. In reality, U is smaller, as some of the mutations will be identical to each other. How many? The size of bacterial genomes ranges approximately 20-fold, from about 500 000 bp to around 10 000 000 bp. So if all mutants originated from the same wild-type parent, our estimate for G, the number of genomes that have existed, would dictate that all mutants have been explored many times. But in reality, of course, bacteria evolve over time, and thus parents are not all the same, making U hard to calculate exactly. Genomic studies of intra-species diversity should soon provide additional data to further pinpoint the degree of overlap between the genomes of individuals in the same species.

 

These numbers yield an estimate of 6x10^41 to 1x10^44 individuals with a total G of 2x10^39 to 4x10^41 genomes. In other words, there have been around 10^40 genomes in the history of life.

 

What Fraction of Possible Genomes of a Given Length Have Been Explored by Evolution?

Note that G is but an insignificant fraction of the number of possible genomes, even for genome lengths corresponding to bacteria with the smallest genomes: the number of genomes possible of length 500,000 bp is 10^301,030.

 

When converted to bits, this yields 133 bits conveyed by any extant genome that has survived such a lengthy evolution, regardless of its length, in addition to the conventional information. For lack of a better name, I shall call such information the evolutionary information of a genome.

 

In the average bacterial species, every point mutation will appear at least once in every generation

The concept of species is not that useful for bacteria which reproduce asexually, but it is nevertheless useful to distinguish genetic species-like 'clusters' of genomes in genome-space. The number of bacterial "species" worldwide is estimated to be more than a thousand million (2). Taking it to be a thousand million, and dividing Whitman's estimate for the total number of bacteria, five million trillion trillion, by it, we get an average worldwide population size for each bacterial species of five billion trillion. This suggests that for the average bacterial species, every mutation that is one, two or even three mutations away from wildtype will appear at least once in every generation. This makes for the very rapid and seemingly "perfect" evolution observed in bacteria, which are continuously climbing along the gradient of fitness every generation, subject to the constraint of continuity (every individual must be only one or few mutations away from its parent).

 

Alex Bäcker

 

(1): Stephen Jay Gould, "Planet of the Bacteria," Washington Post Horizon, 1996, 119 (344): H1; adapted from Full House, New York: Harmony Books, 1996, pp. 175-192.

(2): 11) Bach HJ, Tomanova J, Schloter M, Munch JC. 2002. "Enumeration of total bacteria and bacteria with genes for proteolytic activity in pure cultures and in environmental samples by quantitative PCR mediated amplification." J Microbiol Methods; 49:235-245, as cited in http://www.actionbioscience.org/biodiversity/wassenaar.html.

 

 

 

Comments (0)

You don't have permission to comment on this page.