1994-03-05 - some technical steganography

Header Data

From: hughes@ah.com (Eric Hughes)
To: cypherpunks@toad.com
Message Hash: dde92c7de3bdaaa8cbcc108d82b14e24e7034775374b3e97e8b95d69fa56847b
Message ID: <9403050842.AA06538@ah.com>
Reply To: <9403042136.AA03953@bilbo.suite.com>
UTC Datetime: 1994-03-05 08:51:54 UTC
Raw Date: Sat, 5 Mar 94 00:51:54 PST

Raw message

From: hughes@ah.com (Eric Hughes)
Date: Sat, 5 Mar 94 00:51:54 PST
To: cypherpunks@toad.com
Subject: some technical steganography
In-Reply-To: <9403042136.AA03953@bilbo.suite.com>
Message-ID: <9403050842.AA06538@ah.com>
MIME-Version: 1.0
Content-Type: text/plain


>What does "appears relatively random" really mean?  How do you  
>measure the randomness of a sequence of bits?  

Randomness is the wrong measure.  Suppose I take 2^10 random bits and
prepend 16 zeros.  How random is this?  Almost as random, and this can
be made precise.  How compressible is it?  Almost incompressible.
Now, what about 2^20 bit?  2^30?

It is not randomness but recognizability which is at issue.

Then the next issue arises.

>If the reverse steg process makes it look like all, or even many,  
>files contain hidden messages, even when they don't, then you can  
>plausible deny knowledge of a suspicious bit pattern in any specific  
>file.

The situation of one file is the wrong problem.  Suppose you have a
collection of files.  What you want is deniability for the group of
files as a whole.  This is much trickier, and the obvious thing
doesn't work.

Suppose the files contain some bytes of an RSA encrypted session key
concatenated to the bytes of a file encrypted with the session key.
This is a reasonable scheme, and is basically how a stealth-PGP might
work.  Because the mode of representation is concatenation, the
session key is represented as some arbitrary number X mod N, the
public key modulus.  Recall that N is public.

Now let k be the length of N in bits, rounded up to the nearest
multiple of eight.  Since the encrypted key is represented as bytes,
the bit length is a multiple of eight.  Now the probability that a
random number between 0 and 2^k will be less than N is N/2^k.  Easy.
If N is not chosen specifically with this purpose, the fraction N/2^k
is on average about 1/4.  The important thing is not that this number
is small but that it is less than one, say p.

Now take an arbitrary string of bits and apply the (public) extraction
technique for a given public key, and from this extract a candidate
for the encrypted session key.  Now you can check the candidate
against the modulus.  If the candidate is greater than the modulus,
then you can reject that public key as being a possible recipient of
that message.

The probability that a public key rejects none of a group of files
grows exponentially small, therefore.  Each time a file is not
rejected as a possible message with respect to a particular recipient
key, the probability lowers by p.

You could even check all possible keys.  You may not be able to
identify the recipient, but in aggregate the opponent will be able to
ascertain that messages are being sent.  That is sufficient.
Steganography not only seeks to hide individual messages, but also the
fact that communication is taking place.

There are some defenses.  One can look for public keys which give high
N/2^k ratios.  Unfortunately, this almost assuredly makes factoring
the modulus easier, if only by lowering the search space.

One can make sure the collection of files contains some ringers, such
that the ratio of ringers to real messages is 2^k-N:N.  This is
certainly possible if one is simply storing files, but if the
collection of files were intercepted in transit, the sender would have
to make sure to send files in the correct ratio.  Yet this requires
that the sender look out for you and your security!

What is most broken here is the N/2^k ratio itself, that is, the
artifact of the byte-oriented encoding.  In other words, a random
modular number is not random in the byte length representation.

More to the point, one can't simply lop the front off a PGP message
and get stealth-PGP.  

So one way to solve this is to introduce some indeterminism into the
modular representation, so that the session key is evenly distributed
in all of its relevant representations.  This would mean that every
session on the range [0..2^k) was valid, and was taken mod N to
decrypt a session key.  This yields non-random session keys mod N,
which might be acceptable, since the entropy of the modular
distribution doesn't drop all that much.  Still, this requires the
sender's software to be secure.

Another way would be to use arithmetic coding to spread out the N/2^k
ration throughout the whole file.  For an exact solution, one would
have to use rational cooefficients rather than 2-adic coefficients,
but an approximate solution should be adequate.  One needs for the
approximate case, however, an estimate of the candidate acceptance
rate p above to make sure that the approximation is good enough.  This
solution doesn't require the sender's software to be any more secure
than is in the sender's interest.

In steganography, like cryptography, the different layers of
abstraction forcibly interfere with each other.  The pun here was that
an RSA key (represented by a modular integer) was being put into a
different representation where it didn't work.  These kinds of
level-shifting behavior are all-too-common, and are the cause of much
protocol failure.

Eric





Thread