1994-08-19 - Re: NSA Spy Machine and DES

Header Data

From: Peter Wayner <pcw@access.digex.net>
To: perry@imsi.com
Message Hash: 4ad97201dfe656af29baf790e7a5525336b2aa0c4468041dbf962058880a57bd
Message ID: <199408191712.AA08364@access3.digex.net>
Reply To: N/A
UTC Datetime: 1994-08-19 17:12:58 UTC
Raw Date: Fri, 19 Aug 94 10:12:58 PDT

Raw message

From: Peter Wayner <pcw@access.digex.net>
Date: Fri, 19 Aug 94 10:12:58 PDT
To: perry@imsi.com
Subject: Re: NSA Spy Machine and DES
Message-ID: <199408191712.AA08364@access3.digex.net>
MIME-Version: 1.0
Content-Type: text/plain

It is entirely possible that the Cray SIMD machine will use
Xilinxs. The folks at the Supercomputing Research Center
in Bowie are also building machines with these Xilinxs.
They're known under the name "Splash" and they've built
at least two generations. One of the architects told me
that the machine was only good for "deeply pipelined"
processes. There is one preprint, for instance, that describes
how to do text searching with the machine. (Surprise.)
Much of this should be public because the folks from the SRC
often go to conferences and present information. Two names
on the Splash project that I can think of are Buell and Arnold.
If anyone can dig up papers on this topic, I would be intrested
to read them.

That being said, I still don't really see the advantages of Xilinx.
But this really could be because I've never programmed the machines
nor have I used them for anything. It just seems unlikely to 
me that DES can be done that much faster. 

But like I said, what do I know? I would be intrigued if someone
could run a back of the envelope calculation on building a machine
with Xilinx. How many processes can you do with it? How many testing
circuits can you fit on a chip? How fast will these circuits go? 
What is the big win from pipelining the process? Sure you can 
build a sixteen stage pipeline, but will you need to put copies
of the SBOXes at each stage? How much space will this take? How
deep will the gates be? What is the gate delay at each stage? 
What will be resultant speed? 

The fact is that for all of DES's bitwise 6-to-4 sboxes and other weird
stuff, it isn't that hard to implement in a RISC processor that has
XOR, AND, shifts and fast table lookup. 

Any answers out there?