1992-10-26 - entropy

Header Data

From: Eric Hughes <hughes@soda.berkeley.edu>
To: cypherpunks@toad.com
Message Hash: 9d4dcb1012b41fda6c8f80cb56a7f2aa2e3b85fc1658466140b482591f39088b
Message ID: <9210261554.AA11701@soda.berkeley.edu>
Reply To: <9210260429.AA29646@soda.berkeley.edu>
UTC Datetime: 1992-10-26 15:55:04 UTC
Raw Date: Mon, 26 Oct 92 08:55:04 PPE

Raw message

From: Eric Hughes <hughes@soda.berkeley.edu>
Date: Mon, 26 Oct 92 08:55:04 PPE
To: cypherpunks@toad.com
Subject: entropy
In-Reply-To: <9210260429.AA29646@soda.berkeley.edu>
Message-ID: <9210261554.AA11701@soda.berkeley.edu>
MIME-Version: 1.0
Content-Type: text/plain


Re: entropy

Eric Hollander writes:
>I seem to remember that English text is about 1.5 bits per character.  I can
>find a reference if you're interested.

There are lots of entropies available to measure.  There is "true"
entropy, the lower bound for all other entropy measures.  This is the
compressibility limit.

The entropy I was referring to was simply the single character
entropy.  That is, the probabilities p_i in the entropy expression are
the probabilities that a given single character appear in the text.
This will be higher than the true entropy.  Shannon's estimate for H_1
was 4.03 bits/character.  This assumes a 27 character alphabet.  The
entropy for ASCII-represented English will be higher because of
punctuation and capitals.

The true entropy of English is much lower than this, of course.  But
for an simple measure to automatically distinguish between plaintext
and ciphertext, it should suffice.

Re: uuencoding.  In my analysis before I assume that the uuencoding
would be of random data.  If it is not from random data, then the
entropy will be lower.  Thanks for the clarification.

Eric





Thread