Density

Are there significant neutral gaps between functions of increasing
complexity? It depends upon what happens to the density of beneficial
functions in sequence space as the level of functional complexity
increases. The following is a recent exchange I had with Ian Musgrave
concerning this issue.

Link: http://groups.google.com/groups?q=g:thl3152600197d&dq=&hl=en&lr=&ie=U…
_________________

> > Sean
> Ian

Sean
_________________

> >The
> >blind random walk of random mutation simply cannot sort through this
> >pile of junk sequences in what anyone would consider to be a
> >reasonable amount of time (even given trillions upon trillions upon
> >trillions of years).

> In fact, this is not so. The question is, can one go from any given
> functional sequence to another given functional sequence via steps of
> one mutation.

Yes, this is the question. Please then Ian, explain to me why ebg
negative E. coli cannot go from anything that they have in their
collective genomes in a large colony with over 4 million base pairs
each, to the lactase function? – if this lactase function is truly
only one step away from some other beneficial sequence or series of
one-step beneficial sequences in these creature’s DNA? Hmmmmmmm?
That *is* the question!

> This is similar to the game where you try to get from
> say, CAT to DOG, changing one letter at a time going in steps of real
> english words. For proteins, this is a hard question, given we have
> proteins with a number of different sequence lengths, and 20 possible
> amino acids, but it turns out to have been solved exactly for proteins
> of length 128.

Hmmmmm . . . This is most interesting since proteins of 128aa in
length are relatively short for proteins.

> If the density of functional proteins is one in every
> 10^11 sequences, then we can form such a pathway (see Yockey, H.
> Information Theory and Molecular biology).

Yes, *if* the density were really this high I would agree with you.
Bacterial populations have more than this many individuals so they
could cover this sort of sequence space between stepping stones quite
easily and rapidly.

> So what is the density of
> functional sequences compared to non-functional sequences?

> Well it turns out to be somewhere between 1 in 10^9 and 1 in 10^11
> (recent experiments, like evolving structural proteins from random
> sequences, suggests that the density is around 1 in 10^9, evolution of
> catalytic antibodies suggests a similar density, evolution of enolase
> activity suggests that around 1 in a million random sequences is
> enolase)

A potential problem I see with this estimate is that not all proteins
that are beneficial in a “structural or catalytic way, etc” for this
or that creature in this or that environment are beneficial for a
particular colony of creatures in a particular environment. Also, one
particular function, such as the enolase function, might be very easy
to evolve. If the enolase function were as common as 1 in a million
sequences, this would make 10e160 enolases in 128aa sequence space
(far more than your usual 10e90 sequence with a particular function –
such as the cytochrome c function). However, this is just one
function, beneficial or not. The question is, what is the average
density of all beneficial functions in sequence space? Is the average
absolute number really 10e90 in 128aa sequence space? If so, your
density estimate seems to be just a bit off. For example, lets say we
have a function that 1 in 2 protein sequence share as well as another
function that is only shared by 10 different proteins in all of
sequence space. The ratio of the common function in sequence space
would be 1 in 2 while the ratio of the rare function in sequence space
would only be 1 in 10e165. The average density of beneficial
sequences so far is still only 1 in 10e165. You catch my point here?
Just because you quickly evolve an enolase does not mean that you will
just as easily evolve from the enolase that you have to any other
function since all the other beneficial sequences might be relatively
rare. It won’t really help you to evolve from one enolase to another
enolase either, since this will be just as neutral with regard to
selection as wandering between completely non-functional sequences.

So, given this position of mine, I am betting that the density of
beneficial sequences in 128aa space is far less than 1 in 10e9
sequences. For example, about how many uniquely functional proteins
does an E. coli bacterium have? – around 4,200? Each of these
uniquely functional proteins has a degree of flexibility. When asked
this question you have generally come back with a flexibility of
between 10e60 to 10e90 for proteins even larger than 128aa in minimum
length. If each of these 4,200 proteins had at least 10e90 different
proteins that could perform the same function to at least some degree
of benefit, how many total proteins would be beneficial to this
particular bacterium in its current environment? Well, so far, there
would be at least 4,200 times 10e90 or 10e93 different beneficial
proteins – right? How many other proteins, if this bacterium could
make them, would it be able to use in a beneficial way if they were
added to its genome? Perhaps a few, but probably not all that many in
its current environment. But, lets say that this bacterium could use
another 10,000 new single protein functions in at least some
beneficial way (Which begs the question, if it needed them why hasn’t
it evolved them by now?). Each of these functions, of course, would
have around 10e90 other protein sequences with the same beneficial
function to at least some degree for a total of 10e94 beneficial
sequences. This would give us a grand total of 10e93 plus 10e94,
which would equal 10e95 beneficial sequences in our sequence space of
potential options. Now, what is the sequence density of all these
beneficial proteins in the potential space of 128aa sequences? It is
only 1 in 10e71. Certainly this is a far smaller number than 10e9.
So, one of us isn’t doing our thinking and/or math properly. I’m sure
that must be me. So, please do explain where I went wrong.

> So the density of functional sequences is such that we can find
> pathways from one functional sequence to any other functional
> sequence, without having to hunt for trillions of years (the very fact
> that single mutations can cause profound change of function should be
> a clue).

> The standard anti-evolutionary metaphor is that functional sequences
> are sharp peaks, isolated from each other by broad seas of
> non-functional sequences. In fact functional sequences are broad
> messas, connected to other messas of function by ridges of
> transitional sequences, no functional sequence is truly isolated.

Please explain this statement in the light of my previous question
concerning your estimate of 1 in 10e9 for the beneficial sequence
density of 128aa space. Also, even if your density calculations
happened to be a true reflection of reality, which I cannot fathom at
this time, you would still be in a huge mess. Going from a sequence
space of 128aa to an average sized protein of at least 250aa increases
the potential space from 10e166 to 10e325. What do you think happens
to the density of beneficial sequences in this massively increased
potential space? Do you think it stays the same? If so, how do you
figure this?

> >The main problem here is the
> >issue of neutral gaps that do seem to exist and grow exponentially
> >between functions of higher and higher levels of complexity.

> There are no neutral gaps. The metaphor of functional sequences for
> one function being broad messas of sequences connected by neutral
> mutation, in turn connected to other broad messas of sequences of a
> different function by narrow ridges of transitional sequences. While
> this is a helpful metaphor, it is misleading, as it suggests that very
> few sequences are near the ridge connecting the two functional messas,
> and a given sequence would have to have many neutral mutations before
> it reached the ridge.

> However, we have imposed 3D imagery on a system that can actually only
> be represented in hyperspatial dimensions. Imagine trying to write
> down all the three letter English words that are one letter away from
> CAT on a sheet of paper, such that each word is next to CAT. You cant
> do it on a 2D sheet of paper, you need at least three dimensions.
> Similarly for proteins, a 3D representation doesn’t allow you to show
> all the proteins that are one step away as touching each other, you
> need more spatial dimensions. It turns out in the appropriate number
> of spatial dimensions, no sequence is very far from the “edge” that
> connects it to another set of sequences with a different function
> (Yockey again). This is illustrated by the ease with which we can
> evolve proteins with new functions (eg see Gerlt JA, and Babbitt PC.
> (1998 Oct). Mechanistically diverse enzyme superfamilies: the
> importance of chemistry in the evolution of catalysis. Curr Opin Chem
> Biol , 2, 607-12.).

You are correct in your point that a 3D imagery is not actually
correct. There are a huge number of dimensions in the actual problem.
However, you can quickly see as words and phrases get longer, that
they are indeed separated from other beneficial words and phrases by
more than single mutational steps. This is what makes the 3D imagery
very helpful. It illustrates this point very nicely. A simple
thought experiment should prove this to you. For example, while short
3-letter words are all connected by single mutational steps and make
little mesa clusters, longer words, such as 7-letter words are not so
connected or clustered. Try it and see. Pick an 7-letter word and
then see how many other 7-letter words you can evolve one mutation at
a time. See how many mesas you can evolve between, not to mention
beneficial mesas. Then, once you do this for a while, do the same
thing with a short phrase, like “Methinks it is like a weasel.” See
how many functional phrases you can evolve between. How close are
your mesas now?

I know I know, protein functions are not like the English language.
This is very interesting coming from you and other evolutionists, like
Dawkins, who use English language and computer code analogies and
experiments constantly to explain evolution when you think it serves
your purposes. The fact of the matter is, proteins work exactly like
the English language or any other information system works. There is
no fundamental difference. If you could show how English information
or computer code could evolve via evolutionary mechanisms, certainly
biological information systems would evolve the same way. You would
have solved all of your problems. But, the fact of the matter is, you
can’t explain how the mechanism of evolution works any more than I can
jump over the moon.

> Cheers! Ian

Cheers! ; )

Sean

www.naturalselection.0catch.com