1994-08-09 - Remailer chaining results

Header Data

From: nobody@shell.portal.comhfinney@shell.portal.com (Hal Finney)
To: cypherpunks@toad.com
Message Hash: e7ee346796e30726ec53b3fff5072918a29080a33e83dfdf3c05c0bd1aacc10b
Message ID: <199408090116.SAA05028@jobe.shell.portal.com>
Reply To: N/A
UTC Datetime: 1994-08-09 01:17:03 UTC
Raw Date: Mon, 8 Aug 94 18:17:03 PDT

Raw message

From: nobody@shell.portal.comhfinney@shell.portal.com (Hal Finney)
Date: Mon, 8 Aug 94 18:17:03 PDT
To: cypherpunks@toad.com
Subject: Remailer chaining results
Message-ID: <199408090116.SAA05028@jobe.shell.portal.com>
MIME-Version: 1.0
Content-Type: text/plain


-----BEGIN PGP SIGNED MESSAGE-----

I've done some calculations on the mixing properties of Chaum-style
networks and gotten some interesting results.

Recall that in a Chaum-type remailer network users use nested
encryption and remailing instructions to set up a chain or "cascade"
of remailers.  Each remailer strips off the encryption envelope and
sees the address of the next remailer in the chain or, for the final
remailer, the ultimate destination.  All messages are the same size
and carry no distinguishing features.  We assume that the opponent is
monitoring all messages traffic into and out of all remailers on the
net but can't see what is happening within each remailer.

Let's take a concrete example and suppose there are four remailers.
Everyone sets up a chain of 2 remailers, chosen at random from these
four.  A batch of messages is received by each remailer, which strips
off the envelope and sends them on to the next remailer in the chain,
where they are mixed with the other messages which chose that remailer
as the 2nd in the chain, then sent out to their ultimate destinations.
This model is a little artificial in that we are assuming a certain
amount of synchrony of the operation of the various remailers for
simplicity.  (Note that for this four-node network there are twelve
possible two-node chains where the nodes are different.)

There are three measures that I am interested in: bandwidth used (the
less the better); message mixing (the more the better); and immunity
to subversion (the more the better).  For bandwidth we can measure the
flow through the remailer.  Due to the symmetry of the situation, the
inflow and outflow are equal and the same for all remailers.  Message
flows per remailer are the sum of the flow into the remailer from
outside (the user messages), plus all flows into the remailer from the
other remailers.  Mixing can be measured by a probability distribution
over the outgoing messages which represents how likely they are to be
a given incoming message.  For simplicity this can be expressed simply
as the number M of messages which are equally likely to be the
original (in an earlier message I used entropy which is a log measure
of the same thing).  I am thinking of measuring immunity to subversion
in terms of how much mixing is lost by a certain number of "failed"
(that is, subverted) nodes.  Some networks are vulnerable to "single
point failures", where a single subverted node destroys all the
anonymity.  A more robust network would require multiple failures for
this to happen.  However, it turns out that even in a multiple-failure
network a single-point failure may reveal some information about the
messages, which we can express as a loss in mixing.

Let the total message bandwidth into the network be N packets per time
unit.  Due to symmetry, each node will receive N/4 packets.  With the
chains as defined above, the other three nodes will all be equally
likely to be the 2nd in the chain, so N/12 packets are sent to each of
them.  Simultaneously, N/12 packets come to this node from each of the
others.  This is a total internode bandwidth of N/4 in each direction
per node, or N total per direction.  Add this internode bandwidth to
the user-link bandwidth of N per direction and we get 2N total, or N/2
per node.

At the beginning of each chain, we have N/4 packets come in and get
mixed as each node.  As the packets go out, they are sent to the other
three remailers, and when they leave they may be any of the output of
those three.  Thus they are equally likely to be any of 3N/4 of the
packets, and this is the amount of mixing we have.

If one of the two nodes in your remailer chain is compromised, it
provides no effective mixing.  This means that your message is only
mixed at one node, where it is combined as part of a batch of N/4, so
that is the degree of mixing you have with a single failure.  If both
remailers are compromised then of course you have no mixing, which we
would write as a factor of 1 in uncertainty increase.

This can also be expressed in terms of a percentage compromise of the
network.  If 1 node is compromised, which can be represented as p=.25,
then the six of the twelve remailer paths which use that node will
have single-point failures with the comcomitant reduction in mixing.
In other words, half of the messages will have the full 3N/4 mixing
while half have N/4.  With p=.50, two nodes are compromised.  Two
paths are safe, eight have single failures, and two have
double-failures.  So we have 1/6 of the messages with 3N/4, 2/3 with
N/4, and 1/6 with only 1 mixing.  With p=.75, three nodes compromised,
there are no safe paths; half have single failures and half have
double.  So 1/2 the messages have mixing of N/4 and half have 1.  And
of course with p=1 all messages are compromised with mixing factor 1.

Let me just go on and extend this analysis in one way.  In the
discussion of the chains, we have assumed that the two nodes in the
chain would be different.  Logically though one could have chains
where both nodes were the same.  Let us compare this network with the
one we just did.

There are now 16 possible chains.  Total bandwidth is somewhat less
(since we don't count the messages which stay in one remailer).  Now
only 3/4 of the messages from each node need to get exchanged.  Per
node, there will be N/4 messages to users and 3N/16 messages to other
nodes, for a total of 7N/16 per node or 7N/4 total (above the 7's were
8's).  Mixing is actually improved; there is no limitation on which
input messages might map to which output ones, so we have full N-fold
mixing (compared to 3N/4 above).  With single-point failure mixing is
again N/4 as above.

The failure behavior is quite different.  With p=.25, 1 of the 16
paths is totally compromised, 6 of the 16 have single failures for N/4
mixing, and 9 of the 16 have no failures for N mixing.  With p=.50,
4/16 of the paths have mixing 1, 8/16 have mixing N/4, and 4/16 have
mixing N.  With p=.75, 9/16 have mixing 1, 6/16 have N/4, and 1/16
have N.

It's not clear what measure is useful to compare these failure
situations.  A double-point failure seems much worse than a single
one.  I wonder whether taking a geometric mean (which would be
equivalent to taking the arithmetic mean of the entropies) would be
valid.  If we did that for the p=.25 case, we get average mixing of
.59N^(15/16) for the self-chain network, and .27N for the network
where all chains are two different nodes.  For N less than about
250,000 packets per (network-wide) batch the self-chain network
provides superior average mixing in the p=.25 case by this measure.

Sparing the math, for p=.50 the self-chain network is superior for
batch sizes smaller than 29 packets, and for p=.75 the self-chain
network is only superior for batch sizes smaller than 16 packets.
This suggests that if the network is likely to be mostly safe then the
extra mixing allowed by same-node chains is worth the small increased
risk of exposure.  But as the chance of encountering bad nodes rises
it becomes unwise to take this chance.

Here is a quick summary of the extension of these results to larger
numbers of remailers and longer chains.  Let there be R remailers and
let the chain length be K.  Let the number of message packets per
batch (network wide) again be N.  (I will neglect the differences
between same-node chains and different-node chains as they are
generally small effects on the order of 1/R.)

Bandwidth per node is approximately KN/R.  Network wide it is
therefore KN.  Adding remailer hops increases network bandwidth loads
directly in proportion to the number of hops.  Mixing is approximately
N for K=2 and up, which is the maximum possible.  For K=1 mixing is
N/R.

Fault tolerance is interesting.  A K-length cascade is invulnerable to
up to K-2 failures!  At K-1 the mixing decreases from N to N/R, a
significant decrease.  And with K failures of course the mixing drops
to 1.  I was surprised how robust these networks are.  The reason is
that with even K-2 compromised remailers in a K-length cascade there
still remains a safe length 2 cascade, and as we saw above that provides
N-fold mixing.

This provides some guidelines on the choice of K.  First, K should
clearly be at least 2.  The increase from K=1 to K=2 increases mixing
from N/R to N, a considerable increase.  Secondly, K should probably
be at least 3.  This will provide full mixing even if you are unlucky
enough to choose a compromised remailer.  Beyond this, you can
calculate that with a chain length of K and probability p of a
compromised node, the expected number of compromised nodes in your
chain is Kp.  This suggests that you should choose K large enough that
Kp is well below K-2.  If you estimate p=.50, for example, you might
choose K=8.  The binomial theorem states that the chance of x failures
out of k nodes where the probability of each failure is p is
(p^x)*((1-p)^(k-x))*k!/x!(k-x)!.  In this example, the chance of 7
failures out of 8 is about 3% and the chances of 8/8 is about .5% for
a total risk of 3.5% that you won't be fully protected.

Now, how many people read this far? ;-)

Hal

-----BEGIN PGP SIGNATURE-----
Version: 2.1e (yikes, where'd I find this old version!)

iQCVAgUBLkbYB6gTA69YIUw3AQHligP+PBRC1pmZ6+T10WCQ91SZ2GdYX4/iEsKQ
eMfCLlQ0PFbPEWZ5TaDwbOLCCUSBAbb6OO3Y2U8SHF/zZKJLrHI09/Ssl/ZeQ3st
9G9JrncU9Wo7Z9N1zMPJuQy21qFKNOkAwVQHxThObMSxQWh+TWem8lDKzm6ea0VH
sejMQG+nVyo=
=BWsP
-----END PGP SIGNATURE-----





Thread