Stacking

> Good gravy! That was so wrong, it feels wrong to even use the word
> “wrong” to describe it. All I can recommend is that you run, don’t
> walk, to your nearest college or university, and sign up as quickly as
> you can for a few math and/or statistics courses: I especially
> recommend courses in probability theory and stochastic modelling.
> With all due respect, Sean, I am beginning to see why the biologists
> and biochemists in this group are so frustrated with you: my
> background in those fields is fairly weak – enough to find your
> arguments unconvincing but not necessarily ridiculous – but if you are
> as weak with biochemistry as you are with statistical and
> computational problems, then I can see why knowledgeable people in
> those areas would cringe at your posts.

With all due respect, what is your area of professional training? I
mean, after reading your post I dare say that you are not only weak in
biology, but statistics as well. Certainly your numbers and
calculations are correct, but the logic behind your assumptions is
extraordinarily fanciful. You sure wouldn’t get away with such
assumptions in any sort of peer reviewed medical journal or other
statistically based science journal – that’s for sure. Of course, you
may have good success as a novelist . . .

> I’ll try to address some of the mistakes you’ve made below, though I
> doubt that I can do much to dispel your misconceptions. Much of my
> reply will not even concern evolution in a real sense, since I wish to
> highlight and address the mathematical errors that you are making.

What you ended up doing is highlighting your misunderstanding of
probability as it applies to this situation as well as your amazing
faith in an extraordinary stacking of the deck which allows evolution
to work as you envision it working. Certainly, if evolution is true
then you must be correct in your views. However, if you are correct
in your views as stated then it would not be evolution via mindless
processes alone, but evolution via a brilliant intelligently designed
stacking of the deck.

> > RobinGoodfellow <lmucd@yahoo.com> wrote in message <news:bsd7ue$r1c$1@news01.cit.cornell.edu>…

> > > It is even worse than that. Even random walks starting at random points
> > > in N-dimensional space can, in theory, be used to sample the states
> > > with a desired property X (such as Sean’s “beneficial sequences”), even
> > > if the number of such states is exponentially small compared to the
> > > total state space size.

> > This depends upon just how exponentially small the number of
> > beneficial states is relative to the state space.

> No, it does not. If you take away anything from this discussion, it
> has to be this: the relative number of beneficial states has virtually
> no bearing on the amount of time a local search algorithm will need to
> find such a state.

LOL – You really don’t have a clue how insane this statement is?

> The things that *would* matter are the
> distribution of beneficial states through the state space, the types
> of steps the local search is allowed to take (and the probabilities
> associated with each step), and the starting point.

The distribution of states has very little if anything to do with how
much time it takes to find one of them on average. The starting point
certainly is important to initial success, but it also has very little
if anything to do with the average time needed to find more and more
beneficial functions within that same level of complexity. For
example, if all the beneficial states were clustered together in one
or two areas, the average starting point, if anything, would be
farther way than if these states were distributed more evenly
throughout the sequence space. So, this leaves the only really
relevant factor – the types of steps and the number of steps per unit
of time. That is the only really important factor in searching out
the state space – on average.

> For an extreme
> example, consider a space of strings consisting of length 1000, where
> each position can be occupied by one of 10 possible characters.

Ok. This would give you a state space of 10 to the power of 1000 or
1e1000. That is an absolutely enormous number.

> Suppose there are only two beneficial strings: ABC…….., and
> BBC…….. (where the dots correspond to the same characters). The
> allowed transitions between states are point mutations, that are
> equally probable for each position and each character from the
> alphabet. Suppose, furthermore, that we start at the beneficial state
> ABC. Then, the probability of a transition from ABC… to BBC… in a
> single mutation 1/(10*1000) = 1/10000 (assuming self-loops – i.e.
> mutations that do not alter the string, are allowed).

You are good so far. But, you must ask yourself this question: What
are the odds that out of a sequence space of 1e1000 the only two
beneficial sequences with uniquely different functions will have a gap
between them of only 1 in 10,000? The time required to cross this
tiny gap would require a random walk of only 10,000 steps on average.
For a decent sized population, this could be done in just one
generation.

Don’t you see the problem with this little scenario of yours?
Certainly this is a common mistake made by evolutionists, but it is
none-the less a fallacy of logic. What you have done is assume that
the density of beneficial states is unimportant to the problem of
evolution since it is possible to have the beneficial states clustered
around your starting point. But such a close proximity of beneficial
states is highly unlikely. On average, the beneficial states will be
more widely distributed throughout the sequence space.

For example, say that there are 10 beneficial sequences in this
sequence space of 1e1000. Now say one of these 10 beneficial
sequences just happens to be one change away from your starting point
and so the gap is only a random walk of 10,000 steps as you calculated
above. However, on average, how long will it take to find any one of
the other 9 beneficial states? That is the real question. You rest
your faith in evolution on this inane notion that all of these states
will be clustered around your starting point. If they were, that
certainly would be a fabulous stroke of luck – like it was *designed*
that way. But, in real life, outside of intelligent design, such
strokes of luck are so remote as to be impossible for all practical
purposes. On average we would expect that the other nine sequences
would be separated from each other and our starting point by around
1e999 random walk steps/mutations (i.e., on average it is reasonable
to expect there to be around 999 differences between each of the 10
beneficial sequences). So, even if a starting sequence did happen to
be so extraordinarily lucky to be just one positional change away from
one of the “winning” sequences, the odds are that this luck will not
hold up as well in the evolution of any of the other 9 “winning”
sequences this side of a practical eternity of time.

Real time experiments support this position rather nicely. For
example, a recent and very interesting paper was published by Lenski
et. al., entitled, “The Evolutionary Origin of Complex Features” in
the 2003 May issue of Nature. In this particular experiment the
researchers studied 50 different populations, or genomes, of 3,600
individuals. Each individual began with 50 lines of code and no
ability to perform “logic operations”. Those that evolved the ability
to perform logic operations were rewarded, and the rewards were larger
for operations that were “more complex”. After only15,873 generations,
23 of the genomes yielded descendants capable of carrying out the most
complex logic operation: taking two inputs and determining if they are
equivalent (the “EQU” function).

In principle, 16 mutations (recombinations) coupled with the three
instructions that were present in the original digital ancestor could
have combined to produce an organism that was able to perform the
complex equivalence operation. According to the researcher themselves,
“Given the ancestral genome of length 50 and 26 possible instructions
at each site, there are ~5.6 x 10e70 genotypes [sequence space]; and
even this number underestimates the genotypic space because length
evolves.”

Of course this sequence space was overcome in smaller steps. The
researchers arbitrarily defined 6 other sequences as beneficial (NAND,
AND, OR, NOR, XOR, and NOT functions). The average gap between these
pre-defined steppingstone sequences was 2.5 steps, translating into an
average search space between beneficial sequences of only 3,400 random
walk steps. Of course, with a population of 3,600 individuals in a
population, a random walk of 3,400 will be covered in short order by
at least one member of that population. And, this is exactly what
happened. The average number of mutations required to cross the
16-step gap was only 103 mutations per population.

Now that is lightening fast evolution. Certainly if real life
evolution were actually based on this sort of setup then evolution of
novel functions at all levels of complexity would be a piece of cake.
Of course, this is where most descriptions of this most interesting
experiment stop. But, what the researchers did next is the most
important part of this experiment.

Interestingly enough, Lenski and the other scientists went on to set
up different environments to see which environments would support the
evolution of all the potentially beneficial functions – to include the
most complex EQU function. Consider the following description about
what happened when various intermediate steps were not arbitrarily
defined by the scientists as “beneficial”.

“At the other extreme, 50 populations evolved in an environment where
only EQU was rewarded, and no simpler function yielded energy. We
expected that EQU would evolve much less often because selection would
not preserve the simpler functions that provide foundations to build
more complex features. Indeed, none of these populations evolved EQU,
a highly significant difference from the fraction that did so in the
reward-all environment (P = 4.3 x 10e-9, Fisher’s exact test).
However, these populations tested more genotypes, on average, than did
those in the reward-all environment (2.15 x 10e7 versus 1.22 x 10e7;
P<0.0001, Mann-Witney test), because they tended to have smaller
genomes, faster generations, and thus turn over more quickly. However,
all populations explored only a tiny fraction of the total genotypic
space. Given the ancestral genome of length 50 and 26 possible
instructions at each site, there are ~5.6 x 10e70 genotypes; and even
this number underestimates the genotypic space because length
evolves.”

Isn’t that just fascinating? When the intermediate stepping stone
functions were removed, the neutral gap that was created successfully
blocked the evolution of the EQU function, which happened *not* to be
right next door to their starting point. Of course, this is only to
be expected based on statistical averages that go strongly against the
notion that very many possible starting points would just happen to be
very close to an EQU functional sequence in such a vast sequence
space.

Now, isn’t this consistent with my predictions? This experiment was
successful because the intelligent designers were capable to defining
what sequences were “beneficial” for their evolving “organisms.” If
enough sequences are defined as beneficial and they are placed in just
the right way, with the right number of spaces between them, then
certainly such a high ratio will result in rapid evolution – as we saw
here. However, when neutral non-defined gaps are present, they are a
real problem for evolution. In this case, a gap of just 16 neutral
mutations effectively blocked the evolution of the EQU function.

http://naturalselection.0catch.com/Files/computerevolution.html

> Thus, a random
> walk that restarts each time after the first step (or alternatively, a
> random walk performed by a large population of sequences, each
> starting at state ABC…) is expected to explore, on average, 10000
> states before finding the next beneficial sequence.

Yes, but you are failing to consider the likelihood that your “winning
sequence” will in fact be within these 10,000 steps on average.

> Now, below, we
> will apply your model to the same problem.

Oh, I can hardly wait!

> > It also depends
> > upon how fast this space is searched through. For example, if the
> > ratio of beneficial states to non-beneficial states is as high as say,
> > 1 in a 1e12, and if 1e9 states are searched each second, how long with
> > it take, on average, to find a new beneficial state?

> OK. Let’s take my example, instead, and apply your calculations.
> There are only 2 beneficial sequences, out of the state space of
> 1e1000 sequences.

Ok, I’m glad that you at least realize the size of the state space.

> Since the ratio of beneficial sequences to
> non-beneficial ones is (2/10^1000), if your “statistics” are correct,
> then I should be exploring 10^1000/2 states, on average, before
> finding the next beneficial state. That is a huge, huge, huge number.
> So why does my very simple random walk explore only 10,000 states,
> when the ratio of beneficial sequences is so small?

Yes, that is the real question and the answer is very simple – You
either got unbelievably lucky in the positioning of your start point
or your “beneficial” sequences were clustered by intelligent design.

> The answer is simple – the ratio of beneficial states does NOT matter!

Yes it does. You are ignoring the highly unlikely nature of your
scenario. Tell me, how often do you suppose your start point would
just happen to be so close to the only other beneficial sequence in
such a huge sequence space? Hmmmm? I find it just extraordinary that
you would even suggest such a thing as “likely” with all sincerity of
belief. The ratio of beneficial to non-beneficial in your
hypothetical scenario is absolutely miniscule and yet you still have
this amazing faith that the starting point will most likely be close
to the only other “winning” sequence in an absolutely enormous
sequence space?! Your logic here is truly mysterious and your faith
is most impressive. I’m sorry, but I just can’t get into that boat
with you. You are simply beyond me.

> All that matters is their distribution, and how well a particular
> random walk is suited to explore this distribution.

Again, you must consider the odds that your “distribution” will be so
fortuitous as you seem to believe it will be. In fact, it has to be
this fortuitous in order to work. It basically has to be a set up for
success. The deck must be stacked in an extraordinary way in your
favor in order for your position to be tenable. If such a stacked
deck happened at your table in Las Vegas you would be asked to leave
the casino in short order or be arrested for “cheating” by intelligent
design since such deck stacking only happens via intelligent design.
Mindless processes cannot stack the deck like this. It is
statistically impossible – for all practical purposes.

> (Again, it is a
> gross, meaningless over-simplification to model evolution as a random
> walk over a frozen N-dimensional sequence space, but my point is that
> your calculations are wrong even for that relatively simple model.)

Come now Robin – who is trying to stack the deck artificially in their
own favor here? My calculations are not based on the assumption of a
stacked deck like your calculations are, but upon a more likely
distribution of beneficial sequences in sequence space. The fact of
the matter is that sequence space does indeed contain vastly more
absolutely non-beneficial sequences than it does those that are even
remotely beneficial. In fact, there is an entire theory called the
“Neutral Theory of Evolution”. Of all mutations that occur in every
generation in say, humans (around 200 to 300 per generation), the
large majority of them are completely “neutral” and those few that are
functional are almost always detrimental. This ratio of beneficial to
non-beneficial is truly small and gets exponentially smaller with each
step up the ladder of specified functional complexity. Truly,
evolution gets into very deep weeds very quickly beyond the lowest
levels of functional/informational complexity.

> > It will take
> > just over 1,000 seconds – a bit less than 20 minutes on average. But,
> > what happens if at higher levels of functional complexity the density
> > of beneficial functions decreases exponentially with each step up the
> > ladder? The rate of search stays the same, but the junk sequences
> > increase exponentially and so the time required to find the rarer and
> > rarer beneficial states also increases exponentially.

> The above is only true if you use the following search algorithm:

> 1. Generate a completely random N-character sequence
> 2. If the sequence is beneficial, say “OK”;
> Otherwise, go to step 1.

Actually the above is also true if you start with a likely starting
point. A likely starting point will be an average distance away from
the next closest beneficial sequence. A random mutation to a sequence
that does not find the new beneficial sequence will not be selectable
as advantageous and a random walk will begin.

> For an alphabet of size S, where only k characters are “beneficial”
> for each position, the above search algorithm will indeed need to explore
> exponentially many states in N (on average, (S/k)^N), before finding a
> beneficial state. But, this analysis applies only to the above search
> algorithm – an exteremely naive approach that resembles nothing that
> is going on in nature.

Oh really? How do you propose that nature gets around this problem?
How does nature stack the deck so that its starting point is so close
to all the beneficial sequences that otherwise have such a low density
in sequence space?

> The above algorithm isn’t even a random walk
> per se, since random walks make local modifications to the current
> state, rather than generate entire states anew.

The random walk I am talking about does indeed make local
modifications to a current sequence. However, if you want to get from
the type of function produced by one state to a new type of function
produced by a different state/sequence, you will need to eventually
leave your first state and move onto the next across whatever neutral
gap there might be in the way. If a new function requires a sequence
that does not happen to be as fortuitously close to your starting
sequence as you like to imagine, then you might be in just a bit of a
pickle. Please though, do explain to me how it is so easy to get from
your current state, one random walk step at a time, to a new state
with a new type of function when the density of beneficial sequences
of the new type of function are extraordinarily infinitesimal?

> A random walk
> starting at a given beneficial sequence, and allowing certain
> transitions from one sequence to another, would require a completely
> different type of analysis. In the analyses of most such search
> algorithms, the “ratio” of beneficial sequences would be irrelevant –
> it is their *distribution* that would determine how well such an
> algorithm would perform.

The most likely distribution of beneficial sequences is determined by
their density/ratio. You cannot simply assume that the deck will be
so fantastically stacked in the favor of your neat little evolutionary
scenario. I mean really, if the deck was stacked like this with lots
of beneficial sequences neatly clustered around your starting point,
evolution would happen very quickly. Of course, there have been those
who propose the “Baby Bear Hypothesis”. That is, the clustering is
“just right” so that the theory of evolution works. That is the best
you can hope for. Against all odds the deck was stacked just right so
that we can still believe in evolution. Well, if this were the case
then it would still be evolution by design. Mindless processes just
can’t stack the deck like you are proposing.

> My example above demonstrates a problem
> where the ratio of beneficial states is exteremely tiny, yet the
> search finds a new beneficial state relatively quickly.

Yes – because you stacked the deck in your favor via deliberate
design. You did not even try to explain the likelihood of this
scenario in real life. How do you propose that this is even a remote
reflection of what mindless processes are capable of? I’m talking
average probabilities here while you are talking about extraordinarily
unlikely scenarios that are basically impossible outside of deliberate
design.

> I could also
> very easily construct an example where the ratio is nearly one, yet a
> random walk starting at a given beneficial sequence would stall with a
> very high probability.

Oh really? You can construct a scenario where all sequences are
beneficial and yet evolution cannot evolve a new one? Come on now . .
. now you’re just being silly. But I certainly would like to see you
try and set up such a scenario. I think it would be most
entertaining.

> In other words, Sean, your calculations are
> irrelevant for the kind of problem you are trying to analyze.

Only if you want to bury your head in the sand and force yourself to
believe in the fairytale scenarios that you are trying to float.

> If you
> wish to model evolution as a random walk of point mutations on a
> frozen N-dimensional sequence space, you will need to apply a totally
> different statististical analysis: one that takes into account the
> distributions of known “beneficial” sequences in sequence space. And
> then I’ll tell you why that model too is so wrong as to be totally
> irrelevant.

And if you wish to model evolution as a walk between tight clusters of
beneficial sequences in an otherwise extraordinarily low density
sequence space, then I have some oceanfront property in Arizona to
sell you at a great price.

Until then, this is all I have time for today.

> Cheers,
> RobinGoodfellow.

Sean
www.naturalselection.0catch.com