1994-04-23 - Entropy, WNSTORM and steganography

Header Data

From: rishab@dxm.ernet.in
To: rarachel@prism.poly.edu
Message Hash: f2a1089ba988fc3087dcd31c1513af06aa6e51fa2c029eb04350b946b8fee0d0
Message ID: <gate.Xs38kc1w165w@dxm.ernet.in>
Reply To: N/A
UTC Datetime: 1994-04-23 12:12:06 UTC
Raw Date: Sat, 23 Apr 94 05:12:06 PDT

Raw message

From: rishab@dxm.ernet.in
Date: Sat, 23 Apr 94 05:12:06 PDT
To: rarachel@prism.poly.edu
Subject: Entropy, WNSTORM and steganography
Message-ID: <gate.Xs38kc1w165w@dxm.ernet.in>
MIME-Version: 1.0
Content-Type: text/plain


rarachel@prism.poly.edu (Arsen Ray Arachelian):

> In a previous post you mentioned that PGP does high entropy...  Do you have
> any C source code that finds the entropy of a chunk of data?  (I've written a
> cypher program that hides the cyphertext in a stream of random numbers.)

Entropy is:
 sigma(- q_i * log q_i), 

for all i where q_i is the frequency of token i occurring in the data stream. 
I don't know where I've put my old entropy program, but I cooked one up now, 
attached to the end of the mail.

> Anyway, I'd like to put in an entropy checker into the program.  You may have
> seen me post a notice for it. It's called WNSTORM.  I sent it to soda, I don't

I don't get it. OK, maybe if you see "Entropy 1.0" you may feel more secure 
that the white noise is white noise, but I'm sure you're using some decent 
generator anyway. As far as using entropy to attempt to make the input (noise)
and output (with embedded data) statistically similar goes, it's hardly enough. 
Entropy measure is not the most sophisticated of analysis techniques!

If the real use of WNSTORM is to modify it for stego, to put things into the
low bits, then entropy is *definitely* not a great method of ensuring that 
your stegoed image will be statistically similar to the original. There have
been earlier discussions on methods of ensuring that the percentage of 0s and
1s remains similar before and after stegging (I just love that verb; I steg,
you steg, he stegs, thou steggeth ;-)

I personally believe, based on my not inconsiderable experience working with
images both from the image-processing-programming and the digital-effect-touchup
points of view, that very minor changes in images tend to be noticable to the
human eye, after the right preprocessing. 'Ultimate' steganography may have to 
bother about very sophisticated statistical modelling, or neural networks (I 
know that many number theorists, and Bruce Schneier, intensely dislike the 
latter. They are quite useful, however, in building complex models on data with
which one may have no idea what to do).

I'm waiting for a large collection of 'before and after' stego images, to play
with them and see what I find. (I once worked on a model to recognize faces, 
fast, by generating a pixel-density graph of monochrome edge-outlined images.
Though the project died before the computer properly recognized a face, I could
identify faces from their 'densitographs'.)

-----

> know if it's up there yet.  I haven't checked in a while.  Anyhow unfortunatly
> since you're in India I can't send you a copy.  I wish I could, but I don't
> want the damned ITAR cops on my ass.  (Now if you were to obtain an account
> in the USA, or one that looks like a USA address, you could get it yourself
> without my intervention or knowledge... for all I know you probably have it
> already :-)

Probably... ;-)

-----------------------------------
// this ought to work ;-)

double entropy(FILE *fp) {
double count[256];	// frequency of chars
int c, i;
double entr= 0;

   for (i=256; i--; count[i]=0);

   while((c=fgetc(fp)) != -1) { // for every char, 
      count[c]++;   // inc its count
      length++;     // and the length
   }
   
   for (i=256; i--; count[i]/= length);  // convert counts to frequencies 0..1
   
   // sigma(0..255, -q_i * log_2(q_i)), -q_i bcoz log of fraction will be 
   // negative, we'd like our entropy between 0..1, not 0..-1

   for (i=256; i--; entropy+= -count[i] * log_base_2(count[i]));
  
   return entr; // bits_of_info per BYTE, as we counted 256 values.
}

-------------------------------------------   
-------------------------------------------------------------------------------
Rishab Aiyer Ghosh                            "What is civilisation
rishab@dxm.ernet.in                             but a ribonucleic
Voicemail +91 11 3760335; Vox/Fax/Data 6853410      hangover?"
H-34C Saket New Delhi 110017 INDIA
-------------------------------------------------------------------------------





Thread