1994-01-24 - Randomness of a bit string

Header Data

From: tcmay@netcom.com (Timothy C. May)
To: cypherpunks@toad.com
Message Hash: 4475d4d721136b2ef758a782f0f248330a912106acba68ca19d2c3799fd7226e
Message ID: <199401241857.KAA06412@mail.netcom.com>
Reply To: N/A
UTC Datetime: 1994-01-24 19:06:38 UTC
Raw Date: Mon, 24 Jan 94 11:06:38 PST

Raw message

From: tcmay@netcom.com (Timothy C. May)
Date: Mon, 24 Jan 94 11:06:38 PST
To: cypherpunks@toad.com
Subject: Randomness of a bit string
Message-ID: <199401241857.KAA06412@mail.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain


Here's a short article I wrote for sci.crypt aboout "randomness" of a
bit string and the Kolmogorov-Chaitin definition that a string is
random if and only if it has no shorter description than itself.

This has some fascinating tie-ins to "cryptoregular" strings, which
are strings which appear to be "regular" (a variant of randomness,
meaning all digits are equally represented...high entropy) but which,
with the right transformation, suddenly lose their regularity. 

(For you practical engineering folks, noise sources and other physical
randomness sources will in most cases be enough, even if the
randomness can never be "proved.")

--Tim May


Newsgroups: sci.crypt
From: tcmay@netcom.com (Timothy C. May)
Subject: Re: Randomness of a bit string
Message-ID: <tcmayCK5CtF.23H@netcom.com>
Date: Mon, 24 Jan 1994 18:32:03 GMT

Bruce Grant (bgrant@umcc.umcc.umich.edu) wrote:
: The usefulness of a one-time pad seems to hinge on whether the sequence
: of key bits is really random.  Could someone post a short, not too
: technical definition of randomness of a bit string?  In particular, is
: this a mathematical property, or just a general measure of whether the
: string is "predictable"?  Does it depend on the nature of the cryptanalyst
: or only on the string of bits?  (In other words, if the key is based on
: an Albanian translation of "Mary had a little lamb" is it random if you
: don't know Albanian?)

: Could a program test a key for randomness, or is this meaningless?

A fascinating question! The answer lies at the heart of what we mean
by randomness, complexity, predictability, regularity, and falls into
the field of Kolmogorov-Chaitin complexity, or algorithmic information
theory. Also called "descriptive complexity."

Basic definition: A random string has no shorter description than
itself. That is, it is incompressible.

(Practically, we know "random strings" won't compress much...sometimes
a compressor will shorten them, sometimes it will lengthen them. The
notion above, that random strings will not compress, is very general
and applies in the limit, not for some particular instance of a
string--and some particular instance, e.g., "1 0 0 0 1 1 0" will of
course have a good chance of having some particular compressions, some
short description.)

One consequence is "regularity": all digits of a base will be equally
represented in the limit. Another consequence, as noted in one of the
other followups to this question, is unpredictability of the next
element or bit in a sequence. (Predictability of bits would imply a
compression.)

Cryptography is an interesting situtation. Charles Bennett talks about
"cryptoregular" strings in a paper in the "Physics of Computation"
Proceedings (1992, IEEE Press). A cryptoregular string _appears_ to
have high entropy ("maximum randomness") and regularity (all symbols
equally represented), and thus to be "random." But application of the
_key_ will show the string is actually low entropy ("Mary had a little
lamb, it's fleece was white as snow...") and is very compressible (the
name of the song is the compressed version, for example).

Good cryptography means cryptoregular strings.

A fascinating discovery by Chaitin and others (Kolmogorov, Solomnoff,
Martin-Lof, Levin all worked in this area) is that one can never prove
a given sequence or string is "random." As in some diabolically clever
IQ test, an apparently random sequence may have some shorter
description, or compression, that means it does not fit this
definition of randomness.

Having said this, it is clear that for practical purposes, many
sources used to generate "random numbers," e.g., noise diodes, alpha
particles, tosses of a coin, etc., are "effectively random" (don't ask
me to define this!) in that no compression/prediction will ever be
done, though we can never be absolutely certain one does not exist!

A nice book on this stuff just came out: "An Introduction to
Kolmogorov Complexity and Its Applications," by Li and Vitanyi, 1993,
Springer-Verlag. Cryptography per se is not mentioned (a disappointing
lapse), but the ideas are widely applicable.

--Tim May

-- 
..........................................................................
Timothy C. May         | Crypto Anarchy: encryption, digital money,  
tcmay@netcom.com       | anonymous networks, digital pseudonyms, zero
408-688-5409           | knowledge, reputations, information markets, 
W.A.S.T.E.: Aptos, CA  | black markets, collapse of governments.
Higher Power:2**859433 | Public Key: PGP and MailSafe available.




Thread