1998-10-30 - Re: Shuffling (fwd)

Header Data

From: Jim Choate <ravage@einstein.ssz.com>
To: cypherpunks@einstein.ssz.com (Cypherpunks Distributed Remailer)
Message Hash: 9aa06bacc4a74ba7b8f7bedcd171a8277147e55c1f502d70d2026cb1a90c579a
Message ID: <199810300024.SAA21942@einstein.ssz.com>
Reply To: N/A
UTC Datetime: 1998-10-30 00:42:36 UTC
Raw Date: Fri, 30 Oct 1998 08:42:36 +0800

Raw message

From: Jim Choate <ravage@einstein.ssz.com>
Date: Fri, 30 Oct 1998 08:42:36 +0800
To: cypherpunks@einstein.ssz.com (Cypherpunks Distributed Remailer)
Subject: Re: Shuffling (fwd)
Message-ID: <199810300024.SAA21942@einstein.ssz.com>
MIME-Version: 1.0
Content-Type: text



Forwarded message:

> Date: Thu, 29 Oct 1998 19:22:52 +0100
> From: Anonymous <nobody@replay.com>
> Subject: Re: Shuffling

> For any N greater than two, the random-swap algorithm cannot produce
> a perfectly smooth randomization of A[N].

If by 'perfectly smooth randomization of A[N]' you mean that any two
members of A[N] have equaly likelyhood to be swapped, and hence any pattern
of A[N] members being possible with equal odds then this is incorrect.

For that arrangement to be relevant your (P)RNG function must choose any
element from 0 to N with equal odds, it's probability curve must be a
horizontal line at 1/n for each N. If it ain't you aren't going to randomly
shuffle the elements of A[] since each element doesn't have the same odds of
being selected in this (or every other) turn.

> A perfect randomization of A[N] would give equal probability to each of
> the N! (N factorial) possible rearrangements of A.

Ah, but there aren't N! rearrangements of A if we are limited to only
swapping two elements. There are at best n^2.

Allow me to explain...

> Each iteration of the random-swap algorithm makes two choices of the N
> values to swap.  There are N^2 possible ways these two choices could go.

Ok, we shoot off our first random number, 1/n. We then shoot off our second
random number, also 1/n. The combination is 1/n * 1/n or 1/n^2. Now if we
add the stipulation about the i'th and i+1'st numbers not being identical
then the odds are 1/n * 1/n-1 (note that this boundary condition is
artificial and may in fact be problematic, there is no logical reason to
keep i and i+1 from being the same n if we want truly random behaviour).

> To calculate the probabilities after an iteration, we try all the N^2
> choices and count how many instances of each arrangement are produced.

Huh? What probability are we calculating here, the odds of getting the
particular pattern we have now generated? If so that is either 1/n^2 or
1/n(n-1) depending on the boundary condition.

> This gives each possible arrangement a probability in the form of
> k/N^2, where k of the N^2 choices led to that arrangement.

What arrangement, the one we start with before or after the swap? If we
start with arrangement I and then swap two elements we get I+1. There are
only two ways to get to any particular arrangment. An identical arrangement
where we swap i with itself (ignoring the potential boundary condition)
and the arrangement where the l'th and j'th elements are swapped. As long as
we're only swapping two elements there can only be two possible parent
patterns for any given resultant pattern. Now if we swap more than 2
elements it gets big fast.

I seem to be reading an implication that given a particular pattern, I, there
is some rhyme or reason to the pattern of previous I's that got us here,
there isn't. For any given pattern there is a infinite number of potential
paths that would get to any particular.

Let's say we have:

[ 0  1  2  4  3 ]

and we shuffle and get,

[ 0  4  2  1  3 ]

or,

[ 0  1  2  4  3 ]

Now if the question is how many possible ways could be have gotten to the
original, [ 0  1  2  4  3 ], then there are two ways to calculate that,

given no stipulation on reptitions we get:

n^2

given a stipulation for no repetitions we get:

(n)(n-1)

We can generalize this somewhat if we let s represent our current sequence
and our goal is to determine the number of possible previous states the
sequence could have held.

s-1 = n^2

or,

s-1 = n(n-1)

Now if we want to calculate the number of possible parent generations for
s-1 *from the perspective of s0* we get:

s-2 = (s-1)^2 = (n^2)^2

or,

s-2 = (s-1)((s-1)-1) = (n^2)((n^2)-1)

Thus,

s-3 = (s-2)^2 = ((s-1)^2)^2 = ((n^2)^2)^2

or,

s-3 = (s-2)((s-2)-1) = ( (n^2)((n^2)-1) )( ((n^2)((n^2)-1) )-1) =

(n^4 - n^2)( (n^4 - n^2) - 1)

for the second form only,

s-4 = (s-3)((s-3)-1) = ((n^4-n^2)((n^4-n^2)-1))(((n^4-n^2)((n^4-n^2)-1))-1)

    = ((n^4-n^2)^2 - (n^4-n^2)) (n^4-n^2)^2-(n^4-n^2))-1)

(I'm going to stop here trying to resolve this equation, I don't have the
time. Note that the relation is i(i-1) for each generation so we only
need to affix a relation to i and the generation number. It's worth noting
this can be resolved recursively.)

These can be generalized for the first form to:

C(s-i) = n^(2^i), where i is the number of generations going back.

The inverse makes a handy measure of entropy as well as telling you the odds
of selecting any particular path from one pattern to another pattern given
the number of elements swaps that are allowed.


    ____________________________________________________________________
 
       To know what is right and not to do it is the worst cowardice.

                                                     Confucius

       The Armadillo Group       ,::////;::-.          James Choate
       Austin, Tx               /:'///// ``::>/|/      ravage@ssz.com
       www.ssz.com            .',  ||||    `/( e\      512-451-7087
                           -====~~mm-'`-```-mm --'-
    --------------------------------------------------------------------





Thread