1993-03-16 - [detecting interesting lines to look at]

Header Data

From: Doug Humphrey <digex@access>
To: cypherpunks@toad.com
Message Hash: 86f0c7fa17027eb6c156e47b69289afce6943c0131480a3fb2b3df6beb5c7fb6
Message ID: <199303140418.AA28175@access>
Reply To: N/A
UTC Datetime: 1993-03-16 02:27:29 UTC
Raw Date: Mon, 15 Mar 93 18:27:29 PST

Raw message

From: Doug Humphrey <digex@access>
Date: Mon, 15 Mar 93 18:27:29 PST
To: cypherpunks@toad.com
Subject: [detecting interesting lines to look at]
Message-ID: <199303140418.AA28175@access>
MIME-Version: 1.0
Content-Type: text/plain

>   And don't tell me they can
>   build computers that can distinguish between a PGP file transmission
>   and some 
>   hormone crazed 15 year old dork downloading the latest GIF of Cindy Crawford
>   or a ZIPed ware.  I've looked at hexdumps of GIFs and ZIPs and for all
>   practical purposes they look about as random as PGP data.  If the NSA
>   can build a parellel computer that scans all the trunks in the U.S.
>   simultaneously AND can tell the difference between PGP streams and ZIP/GIF
>   file data streams, then I just might as well go and shoot myself right
>   now.
>Er.... you might want to get your gun out..... the middle of hexdumps of
>GIF's and ZIP's and PGP files may look the same, but the file headers
>are quite distinguishing.  If you want to hide encrypted data, each
>person needs to find their own way of doing it ---- if everyone hides it
>in the low bits of a GIF file, it would be very simple for the NSA to
>scan GIF files to see if the low bits looked like the header of a PGP

To some extent, this discussion is ignoring the importance of 
"context".  Yes, if you have to do detailed searches of the data
traveling down a million lines, you are likely to fail.  That is
why you don't do it.  

What you DO is look for things that look out of the ordinary,
things that alone would look fine, but within a given context 
would look wrong, and then search those exception cases in 
more detail.  Example, someone comes up with a way that voice
looks just like fax from the data spectrum standpoint.  Great,
no way that anyone can scan the line and figure out, in the 
few seconds that they are scanning, that what they are seeing 
is really voice.  So, you attack it by looking at connection 
records, and looking for what looks like fax machines from the 
data standpoint, but seems to have a usage record (times of day, 
duration of calls, time between retrys, etc) of telephones.

Remember, even though the technology has changed, the end users
of it have not, and the end users are the ones that you are 
looking for, the ones who are setting up the usage records.

So, they now have a catagory of "fax machines that behave like
fax machines" and "fax machines that behave like phones".

Wonder which ones they will use the Special Equipment on, eh?

Same goes for PGP vs. GIFS.  The guy moving 4k long GIFS is
the guy moving the PGP stuff that looks like GIFS.  It doesn't 
nail all of the possible uses, but this is all a game of the 
odds anyway, and in the long run the usage patterns, the more meta
data, can give people good clues to work with.