1996-06-01 - Re: Statistical analysis of anonymous databases

Header Data

From: tcmay@got.net (Timothy C. May)
To: cypherpunks@toad.com
Message Hash: 8b9793991a3531fb087e2707dfea8302b6fa7ffc0d1d6f000f95219d05c99451
Message ID: <add4e62c000210047c18@[205.199.118.202]>
Reply To: N/A
UTC Datetime: 1996-06-01 05:14:38 UTC
Raw Date: Sat, 1 Jun 1996 13:14:38 +0800

Raw message

From: tcmay@got.net (Timothy C. May)
Date: Sat, 1 Jun 1996 13:14:38 +0800
To: cypherpunks@toad.com
Subject: Re: Statistical analysis of anonymous databases
Message-ID: <add4e62c000210047c18@[205.199.118.202]>
MIME-Version: 1.0
Content-Type: text/plain


At 10:53 PM 5/31/96, David Wagner wrote:
>In article <v01540b02add1fc6e4658@[193.239.225.200]>,
>Clay Olbon II <Clay.Olbon@dynetics.com> wrote:
>> In medical research (this particular application - there are others I am
>> sure) it is desirable to have a large database of individual medical
>> histories available to search for correlations, risk factors, etc.  The
>> problem, of course, is that many individuals want their medical histories
>> kept private.  It is therefore necessary to maintain a database that is not
>> traceable back to individuals.  An additional requirement is that people
>> must be able to add additional information to their records as it becomes
>> available.
>
>How about a simple non-technical solution?  Each patient picks a
>random pseudonym; the database is keyed off that pseudonym, and the
>person's True Name(tm) never appears in the database.  Patients
>should remember their pseudonym (or write it down); then they can
>add information to the database.

This "leaks" too much information. It is not hard at all to figure out that
the only 32-year-old white male with appendicitis is Sidney Jackson. And so
on, for enough of the patients to effectively identify most or even all of
them.

Blinding only the true name while leaving the essentially unique parameter
sets unchanged pretty much makes the name blinding moot. I have a hunch
it's possible to blind the individual parameters in some way so as to make
analysis possible, but I don't have any approaches in mind.

I recall that Joan Feigenbaum was working on "computing with encrypted
instances" for her Ph.D. work at Stanford (she's now at one of the AT&Ts
now). The idea being quite similar to this application: transform a set of
data for analysis by a party which is not to know the nature of the work
being done, then transforms back the answers obtained.

And there is another angle discussed a few years back in connection with
AIDS testing. Specifically, for door-to-door polls asking if a person has
been tested for AIDS (or whatever). One effectively "confuses" the answer
by flipping a coin  or rolling a die and "switching" the answer depending
on the results. This allows any particular person to, for example, say
"Yes" to the question "Have you been tested for AIDS?" without this
actually being the case. (The tosses have to be skewed so that
statisticians can still extract/deconvolve useful information.) I think of
this as "plausible deniability."

Of course, this is confusing to the average person, and maybe even to folks
like us, so this proposal (by I don't recall whom) has not gone very far.
But it shows that some semi-cryptographic protocols could be used to get
some sensitive information.

I don't know if something like this could be used for the medical database
problem, but it's interesting.

--Tim May

Boycott "Big Brother Inside" software!
We got computers, we're tapping phone lines, we know that that ain't allowed.
---------:---------:---------:---------:---------:---------:---------:----
Timothy C. May              | Crypto Anarchy: encryption, digital money,
tcmay@got.net  408-728-0152 | anonymous networks, digital pseudonyms, zero
W.A.S.T.E.: Corralitos, CA  | knowledge, reputations, information markets,
Licensed Ontologist         | black markets, collapse of governments.
"National borders aren't even speed bumps on the information superhighway."









Thread