1994-03-01 - Re: standard for steganography?

Header Data

From: norm@netcom.com (Norman Hardy)
To: Sergey Goldgaber <sergey@delbruck.pharm.sunysb.edu>
Message Hash: bd0bfe7356b519e07ee38587e27415213834a723dcdcd16a9027a0351f596cba
Message ID: <199403010818.AAA19344@mail.netcom.com>
Reply To: N/A
UTC Datetime: 1994-03-01 08:17:41 UTC
Raw Date: Tue, 1 Mar 94 00:17:41 PST

Raw message

From: norm@netcom.com (Norman Hardy)
Date: Tue, 1 Mar 94 00:17:41 PST
To: Sergey Goldgaber <sergey@delbruck.pharm.sunysb.edu>
Subject: Re: standard for steganography?
Message-ID: <199403010818.AAA19344@mail.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain


At  0:56 3/1/94 -0500, Sergey Goldgaber wrote:
>On Mon, 28 Feb 1994, Norman Hardy wrote:
>
>> Has anyone done statistical studies of low bits of pixels or sound samples?
>> I suspect that they are often far from random. A flat 50% distribution in
>> the low bits might standout like a sore thumb. I can imagine the the low
...
>Yes, pure white noise would be anamalous.  I have suggested that one use 
>a Mimic function with a "garbage grammar".  Implemented correctly, it should
>withstand statistical analysis.
>
>What is an AD converter?  And what are the techniques you speak of that 
>mimic those AD converters?

'AD converter' = 'Analog to Digital converter'.

Here are three schemes each with flaws:

Consider an alphabet of 10 bit characters with a probability distribution
such that each bit has an expected value of .6 (instead of the normal .5).
The character 000000000 has a probability of .4^10 = .000105 and
p(1111111111) = .6^10 = .006046. Do a Huffman encoding on this alphabet.
000000000 codes as 13 bits and 1111111111 codes as 7 bits. Take the cipher
stream and execute the Huffman decode(!) operation on the cipher stream.
Out comes a sequence of 10 bit bytes with 60% ones. To retrieve the
original cipher stream execute the normal Huffman coding algorithm and get
the original stream. The flaw here is that Huffman assigns some probability
to each of the 10 bit characters which is 2^-7, 2^-8, ... 2^-13.  The
intermediate probabilities are not represented. This would show up without
too much data.

Another scheme is called 'arithmetic coding'. It avoids the above
probability quantization but is tricky to program. I can't find a reference
to it just now but it should appear in any modern book in information
theory. Unlike Huffman it does not code each character into a definite
number of bits but codes a sequence of several characters into a 'real
number'. Adapting this to numbers that real computers can use is tricky.
Again you feed the flat cipher stream into the decoding end of the
algorithm and get biased bits.

The above two schemes are information efficient. With a 60% bias you get
97% efficiency. If you are willing to settle for 80% efficiency you can
merely establish a RNG synchronized at sender and receiver that sends a bit
from the cipher stream with probability .8 and sends a one with probability
.2.







Thread