1995-08-29 - SSL keyspace etc.

Header Data

From: “Vladimir Z. Nuri” <vznuri@netcom.com>
To: cypherpunks@toad.com
Message Hash: d076385e5bd3d07085105caf0b759a1241a0f62e1075732904de2a26447889b6
Message ID: <199508291940.MAA22969@netcom6.netcom.com>
Reply To: N/A
UTC Datetime: 1995-08-29 19:42:52 UTC
Raw Date: Tue, 29 Aug 95 12:42:52 PDT

Raw message

From: "Vladimir Z. Nuri" <vznuri@netcom.com>
Date: Tue, 29 Aug 95 12:42:52 PDT
To: cypherpunks@toad.com
Subject: SSL keyspace etc.
Message-ID: <199508291940.MAA22969@netcom6.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain


regarding SSL challenge, I am not following this close enough 
to understand completely, but I thought I would offer a few suggestions
for tweaking the code:

--

the issue of grabbing keyspace has been raised. what if someone
malicious just yanked huge areas of keyspace and didn't search them?
it seems that the clients need to return to the server some evidence
that they have searched their keyspace in question. the server could
verify this evidence. for those that don't return the "evidence", that
keyspace could be reallocated to other comers.

the simple approach to all this, if you don't have "evidence", is to 
just have the server keep reallocating the same space over and over
to different crackers. hopefully eventually every part of the keyspace
would be allocated to a "legitimate" worker.

--

the issue of efficiency is very fascinating for this project. essentially
the server has no idea what the block size of key blocks it should dole
out. obviously the server would want to try to dole out equal *processing
chunks* such that the remote machine reports back in a certain amount
of time, no matter what architecture. the problem of course is that remote
machines all have different efficiency.

two possibilities: a sort of "bogomip" calculation is done in the client,
and its processor speed is reported to the server. the server uses
this in a calculation to determine how much to dole out. it could try 
to derive a best fit linear relationship between space covered and
processor spead, or build up a table of results and interpolate for
new requests.

note that the efficiency issue also ties into "what if people take
keys they don't solve". if the server knows roughly how long a client
should take to report back, and it never reports back, it could then
reallocate that key space.

--

another problem of efficiency is that the server is clearly a bottleneck
for servicing requests. the question arises: suppose that the server
could determine the precise interval between which machines would
go back to it for new keys. what is the optimum interval over the
whole project? in other words, give the number of machines participating,
and their processor speeds, what size of key space should be parceled
out to the next request so that the bottleneck at the server is
minimized? this optimum interval must be very hard to derive, because
it depends on the contention based on many incoming connections. it
would involve some probabilistic approximations of the likelihood
of collisions. 

to model it, you might consider a request as taking [n] seconds of
time, and consider that if any two requests are in contention, a 
retry happens after [m] seconds. you could build up models that
would try to minimize the time based on empirical simulations. 
however I would be exceedingly impressed if someone could derive
a formula for this, or give it from some textbook. 

--

adaptive algorithms for all these situations are possible. the server
could use a "hypothesis" in the sense of partitioning out a starting
size of keyspace, and then watch how long it took the client machine to respond,
and then assume a linear relationship or something to compute the size
of the next keyspace to hand out to the machine. the server could continually
watch how closely its "hypothesis" (i.e. its estimations of how long a
given machine will take) match the actual returns.

--

more on the idea of evidence: we are working with a hashing algorithm,
right? as evidence the client machines could return checksums of all
the hashes of all the keyspace it searched. it could break up its
own search space into blocks and return the checksums on the hashes
for each block. the server, if it wanted to, could verify these blocks
running its own computations. if it ever found a client was "unreliable",
it could then diminish the keys sent to the unreliable client, or even
send it areas of search space it didn't care about anymore (i.e. areas
that have already been confirmed searched by a more "reliable" client).

--

in fact all this reminds me of the process of intelligence gathering
by an agency, which could be formalized as follows: suppose that
the agency wishes to identify "quality information". it has a set
of sources, A,B,C,D....  now, it can send questions out to these
sources and get information from them. some of them however would
be "unreliable". the agency must devise some means by which it can
weed out the unreliable sources. note that this may even involve
sending them bogus instructions to keep them busy so they do not
themselves suspect they have been "discovered" and then change
their defective plans.

obviously, one of the most important intelligence tools in this
matter is that of *correlation*. you have to determine "truth"
(or "quality information") via the correlation between answers that
the different sources give you. also important to correlation is
*redundancy*. you sometimes have to ask more than one source the
same question, and test the answer. in this model, if A and B
give different answers, you know that one of A or B is "unreliable".

what is very interesting in our case of cracking keys is that the
server can verify the information on its own. in other words, it
has a *control* that it knows is correct that it can judge against
the answers "out there". unfortunately, in contrast, real intelligence
agencies are not always privy to this kind of certain "control" and
in fact have to determine "truth" entirely from a set of sources,
any of which might be unreliable. in this case one has to have
a hypothesis about what is the "truth" and test it to see if it
holds up consistently with all information.

the approaches of attackers are obvious. the most obvious is that
of collusion and infiltration. but I will save the rest for some
NSA spook to elaborate. there are certainly enough colluding and
infiltrating on this list <g>

--

one of the reasons all this interests me is that it really reminds me
of some projects I have worked on in the past. in high school I wrote
a network mandelbrot set program (client/host). the issue of contention
arose and it appeared to me to look like an upside-down parabola after
I plotted some points (curving up, that is). i.e. the optimum was at
the pit of the parabola, and when too few or too many requests happened,
the speed over the overall simulation was increased above the optimum.
some very ingenious readers may actually be able to locate this 
code, which I put in the public domain over 5 years ago.

--

another thing I worked on was trying to find the optimal block size
of communications protocols such as Zmodem, which generally instead
just pick arbitrary block sizes 2^n. I actually was able to attack
this problem analytically through the observations of the properties
of infinite series and calculus techniques. it is a similar problem
but the idea of contention really complicates this issue. (for what
I studied, there was only one client and one server, so to speak).

I still have this paper in Latex format and if anyone is interested
I would be happy to send it to you. it's a really nice example, IMHO,
of how if you use your brain and some mathematics, you can really
get a far more elegant approach than brute force, and know with 
much greater certainty  that what you are doing makes sense 
mathematically.

an awful lot of programmer just tend to bang on the keyboard with out
thinking of the theoretical implications of their work. this is
understandable given that the theoretical implications of even trivial
programs (such as the SSL client/server interactions) can be 
mathematically extremely daunting, requiring even differential 
equations to model fairly simple pieces of code.

--

well, that is my contribution of the moment into the cypherpunk 
annals. one never knows what a little combination of boredom 
and inspiration can lead to.

--V.Z.Nuri






Thread