1998-08-10 - Re: compression vs encryption

Header Data

From: bill.stewart@pobox.com
To: “Bernardo B. Terrado” <cypherpunks@toad.com
Message Hash: a190c56c7cea0fd4d2721b44a1972ea8f205cadff3a20ec66ddd06f49cb8b36f
Message ID: <>
Reply To: <199808031315.PAA15001@replay.com>
UTC Datetime: 1998-08-10 23:36:48 UTC
Raw Date: Mon, 10 Aug 1998 16:36:48 -0700 (PDT)

Raw message

From: bill.stewart@pobox.com
Date: Mon, 10 Aug 1998 16:36:48 -0700 (PDT)
To: "Bernardo B. Terrado" <cypherpunks@toad.com
Subject: Re: compression vs encryption
In-Reply-To: <199808031315.PAA15001@replay.com>
Message-ID: <>
MIME-Version: 1.0
Content-Type: text/plain

At 06:15 PM 8/6/98 +0800, Bernardo B. Terrado wrote:
>What's the difference between compression and encryption?
>Correct me if I'm wrong,  in compression we lose something, like in 
>image compression and audio compression,  but in encryption, we don't. 

Compression makes stuff smaller; encryption keeps stuff private.

Compression works by not sending data you don't need -
there are lossless compression systems, like simple run-length encoding
and more complex systems like Huffman and Lempel-Ziv,
which build abbreviations for common strings, 
and lossy compression systems like some JPEG and MPEG modes
and most voice compression systems which discard information
that doesn't have much effect on the user's perception of the image/sound.

You can find books on compression techniques, and can find
compression discussed in general books on algorithms.
For lossless compression, you can get by knowing simple math,
like probability and statistics, though it helps to know
about polynomials and modular arithmetic and such. For image compression, 
it helps to know about discrete cosine transforms and that sort of math.  
Audio compression includes simple techniques like
Adaptive Differential Pulse-Code Modulation, for which a little
calculus is helpful but not too critical, and complex techniques
using frequency spaces, Fourier transforms, and various models
of the human vocal tract, for which you need more math.

RLE is simple, and can work well or badly depending on the data.
For each set of repeated characters, it sends the character and count.
If the data usually has long runs of values, e.g. lots of black
and lots of white, it wins; if most values happen once in a row, it loses.
With Huffman coding, the input strings are usually the same length,
e.g. single letters, and the output strings are variable length,
with short output strings used for more frequently used inputs,
and longer output strings used for less frequently used inputs;
the more you know about the statistics of the input stream,
and the more unbalanced the statistics are, the smaller the output.
With Lempel-Ziv, the output strings are almost-constant length
indexes into a table of input strings which is built dynamically,
so the first time the compressor sees a string "ab", it creates a
table entry and sends the entries for "a" and "b", and the next time 
it sees "ab", it sends the code for "ab" (and also creates the 
code for "abc", if "c" is the next entry....)

There are also books about encryption.
Bruce Schneier's book "Applied Cryptography" is popular and good.

Most encryption systems compress their plaintext before encrypting,
partly because it's usually faster, partly because you can't
make encrypted text smaller by compressing it, and partly because
it can be more resistant to attacks like known-plaintext and chosen-plaintext.

>One sure protection againts evil hackers is encryption (?), but what if 
>that evil hacker deletes the ciphertext file and the plain text file (that
>is assuming the evil hacker just deletes every file he wants to delete).
>---at least he/she didn't read/saw the file.

If evil people can get on your system, you lose.
Encryption is part of protecting a system, but not enough.
Encrypting your disk drives helps reduce the risk if
they steal your computer (when police steal computers,
they use fancy words like "subpoena" and "confiscate", 
but it's the same thing :-).  Encrypting your email
means that evil people can't read it when it's in transit.
But if they can break in to your machine because you're using 
Windows or a badly-configured Unix system,
they may be able to read your mail or files before you encrypt them,
or read your incoming mail after you decrypt it,
or replace your encryption program with something that
also lets them decrypt the mail, and you still lose.
Bill Stewart, bill.stewart@pobox.com
PGP Fingerprint D454 E202 CBC8 40BF  3C85 B884 0ABE 4639