1996-10-24 - Re: probability question for math-heads

Header Data

From: “Timothy C. May” <tcmay@got.net>
To: “cypherpunks@toad.com
Message Hash: a43376d68ff525e0f224ddda27702127a74b5ca13873d32996265f330db83e14
Message ID: <v03007803ae95511957ea@[207.167.93.63]>
Reply To: <326EAA23.2DB1@best.com>
UTC Datetime: 1996-10-24 16:47:31 UTC
Raw Date: Thu, 24 Oct 1996 09:47:31 -0700 (PDT)

Raw message

From: "Timothy C. May" <tcmay@got.net>
Date: Thu, 24 Oct 1996 09:47:31 -0700 (PDT)
To: "cypherpunks@toad.com
Subject: Re: probability question for math-heads
In-Reply-To: <326EAA23.2DB1@best.com>
Message-ID: <v03007803ae95511957ea@[207.167.93.63]>
MIME-Version: 1.0
Content-Type: text/plain


At 11:28 PM +0000 10/23/96, geeman@best.com wrote:
>I'm too tired &/or busy to work this out, via Knuth --- maybe you can
>help, with some implications for the DES keysearch strategy.
>
>What is the expected distribution, in a "random" binary sequence  --
>with all the fuzziness that implies as to what _exactly_ is "random" --
>of gaps between runs of same-bits.
>
>i.e. what is the expected distribution of sequence length between
>occurances of two (and only two) 1-bits in a row?  how about sequences
>of 3 1-bits?  ETc.
>
>We know that in a _truly_ random sequence, taken over a long enough
>period, there should be all possible values of "gaps".  But what is
>reasonable to expect in a "random" sequence as to how those gaps are
>distributed?  Is my question equivalent to Knuth's gap test?

It obviously depends on your definitions of period, gap, clusters,
patterns, etc. These would have to be pinned down before a precise answer
could be given. For example, if the sequence is 30 digits long, or 5 digits
long, etc.

But the general _framework_ for looking at this problem would probably be
the Poisson distribution. That is, if the _expected_ ("average") number of
gaps, runs, whatever, is "m", then the number "s" of actual gaps/runs that
is 0, 1, 2, 3, 4, ...., n, is given by the Poisson distribution:

P [s; m] = (exp (-m))  (m ^ s)  / (s!)

(I read this like this: "The Poisson probability of seeing s occurrences
given that m are expected is e to the minus m times m to the s divided by s
factorial." I literally remember the formula from this verbal pattern,
having used it much in my earlier career.)

Thus, if one _expected_ to see m = 3 occurrences of some event, then one
would likely see 0 occurences (exp (-3))  (3 ^ 0)  / (0!) = 0.0497 = 5% of
the time, to see 1 occurrence (exp (-3))  (3 ^ 1)  / (1!) = 0.149 = 15% of
the time, to see 2 occurences (exp (-3))  (3 ^ 2)  / (2!) = 0.224 = 22.4%
of the time, and so on, even having a chance of seeing s = 10 occurences
given by:

(exp (-3))  (3 ^ 10)  / (10!) = 0.00081

As expected, the cumulative probability of all the occcurrences is 1.00.
(In fact, the Poisson distribution can be derived from some very basic
considerations, such that the sum of all outcomes must be 1, and that the
past history of occurrences does not effect the next outcome.)

In this example, I would put the expected value of the "gap" between the
same values (e.g., gap between 1s, gap between 0s) at exactly 0.5, for this
random sequence. (This is the weakest part of this post, as it's possible
I'm falling into a kind of trap by putting the expected value at one half.
If anyone thinks differently, e.g. that it should be "1," we can discuss
this. But the framework is otherwise unchanged.)

Thus,

P [0; 0.5] = (exp (-m))  (m ^ s)  / (s!) = 0.6065

P [1; 0.5] = (exp (-m))  (m ^ s)  / (s!) = 0.3033

P [2; 0.5] = (exp (-m))  (m ^ s)  / (s!) = 0.0758

P [3; 0.5] = (exp (-m))  (m ^ s)  / (s!) = 0.0126

P [4; 0.5] = (exp (-m))  (m ^ s)  / (s!) = 0.0016

etc.

There are other ways of looking at this situation, such as the binomial
expansion (e.g., Pascal's triangle), but the results should be the same.
(Of course, as they all are interrelated.)

(In particular, for various "clumps" and "clusters" of bits in a random
distribution, e.g., "how often does the pattern "101111011" occur in random
sequences of length 30," one could explicitly calculate the permutations
and combinations, exactly as one would calculate hands of poker or bridge,
a well-known easy problem in statistics and probability.)

I like the Poisson approach as a first cut because it works for any form of
a random process where some value is "expected" and one wants to know the
distribution of "actuals." Clicks on a Geiger counter in some period,
number of persons ahead of one in a supermarket checkout line, number of
Prussian soldiers kicked to death each year by their horses, etc.

The Poisson also works for finite distributions, e.g., a sequence of 10
digits, whereas Gaussian probabilities are best for continuous/unlimited
distributions. (In the limit, of course, they are the same.)

I hope this helps.

(BTW, the Poisson distribution is what I used several months ago to show
that the probability of finding a key with a random search (no coordination
between various searchers) is sufficiently high. If the _expected_ value of
finding the key is say "1" (effort the same as coordinated search), then
the probability of _not_ finding the key is given by P [0; 1] = exp (-1) =
36.8%. If twice the effort is put into this, then P [0; 2] = exp (-2) =
13.5%, and so on. So, with several times the effort, the probability the
key has not been found is low.)
--Tim May










"The government announcement is disastrous," said Jim Bidzos,.."We warned IBM
that the National Security Agency would try to twist their technology."
[NYT, 1996-10-02]
We got computers, we're tapping phone lines, I know that that ain't allowed.
---------:---------:---------:---------:---------:---------:---------:----
Timothy C. May              | Crypto Anarchy: encryption, digital money,
tcmay@got.net  408-728-0152 | anonymous networks, digital pseudonyms, zero
W.A.S.T.E.: Corralitos, CA  | knowledge, reputations, information markets,
Higher Power: 2^1,257,787-1 | black markets, collapse of governments.
"National borders aren't even speed bumps on the information superhighway."









Thread