1993-12-01 - Re: Entropy, Randomness, etc.

Header Data

From: collins@newton.apple.com (Scott Collins)
To: jerry@terminus.us.dell.com (Jeremy Porter)
Message Hash: 466a597400e54ac6f380c8a55acf3354652e50df9bd5cc7f71e24cb8f951eb0f
Message ID: <9312010202.AA16542@newton.apple.com>
Reply To: N/A
UTC Datetime: 1993-12-01 02:07:54 UTC
Raw Date: Tue, 30 Nov 93 18:07:54 PST

Raw message

From: collins@newton.apple.com (Scott Collins)
Date: Tue, 30 Nov 93 18:07:54 PST
To: jerry@terminus.us.dell.com (Jeremy Porter)
Subject: Re: Entropy, Randomness, etc.
Message-ID: <9312010202.AA16542@newton.apple.com>
MIME-Version: 1.0
Content-Type: text/plain


Good questions.

  >My understanding of a random number is a number is generated from
  >two or more unrelated events.

No.  This may be one general category of ways to manufacture random
numbers, but specifically a random number is just an arbitrary number
typically drawn from a sequence of independent arbitrary numbers.  The
quality of 'randomness' is a measure of the independence of the elements in
the stream.  Therefor, there is no such thing as a random number except as
an element of a sequence or other context from which to establish
independence.

  >In order for this number to be most useful cryptographically, it needs
  >a even distribution.

No.  In order for this number to be cryptographically useful, it must cost
more to guess the number (perhaps knowing the numbers that came before)
than the reward for guessing it correctly.  It happens that non-flat
distribution of a sequence is a lever for cheaper guessing, thus flat
distribution is natural characteristic of high-quality random sequences.

  >Does this make this distribution a gaussian distribution?

No.  Gaussian (a.k.a. 'normal') distribution is the bell curve (and clearly
indicates a relation between samples).  Math texts describing this
distribution often use the phrase 'distribution of some random variable x',
by which they in fact mean 'distribution of samples from a varying source'.


  >Also how are these statistical measurements done?

See Knuth, "The Art of Computer Programming", Volume 2: Seminumerical
Algorithms, Chapter 3: Random Numbers.

  >Is it as simple as a histogram?

Yes.

  >Are we talking frequency analysis with FFTs and
  >more advanced things?

Yes.

  >How do we measure the entropy of a random number, or a series of
  >random numbers?

Ah.  Now we're talking.  Entropy is closely related but not equal to
'randomness'.  Entropy is a measure of information often expressed as the
fraction of information-size to data-size.  Randomness is a measure of
unpredictability.  A sufficiently random sequence will be of very high
entropy from the perspective of the 'guesser', though not necessarily from
the that of the generator (e.g. a PRNG).  The best way to measure entropy
(if that is what you want to measure), is to build a sufficiently powerful
Markov model, or the equivalent, to predict the sequence, and treat it like
a compressor.  The number of bits output is the entropy of the sequence
with respect to that model.  If you can't build a model as smart as your
presumed attackers (as smart as them, not as smart as any model they might
build), then you will have to use more tests to assure yourself of
indepence of elements in the sequence (see Knuth, et al).  In practice
however, most of these methods represent very low bars over which any RNG
_must_ jump, and which often poor ones can.  Most RNGs are broken by
understanding how they work, and exploiting weaknesses in their
construction and context (e.g. poor 'seed' selection).

  >Have people on the list done this, or is this still in the range of
  >people that do math and number theory for a living?

Yes, and Yes.

  >These topics are probably covered in some of the basic books in the field,
  >but all of the reference's I've been able to locate don't go into
  >specifics of how to measure the quality of random numbers.

See Knuth.

  >Unless some measurements are made, you can't really be sure that those
  >/dev/mem MD5 hashes don't come up the same 10%, 30%, or more of the time.
  >It seems that a lot of assumptions are being made about what is good and
  >what isn't.

You should be exactly as paranoid as it is cost effective to be.

Hope this helps.



Scott Collins         | "Few people realize what tremendous power there
                      |  is in one of these things."     -- Willy Wonka
......................|................................................
BUSINESS.   voice:408.862.0540  fax:974.6094   collins@newton.apple.com
Apple Computer, Inc.   5 Infinite Loop, MS 305-2B   Cupertino, CA 95014
.......................................................................
PERSONAL.   voice/fax:408.257.1746    1024:669687   catalyst@netcom.com







Thread