1996-05-29 - Re: Statistical analysis of anonymous databases

Header Data

From: Adam Shostack <adam@lighthouse.homeport.org>
To: Clay.Olbon@dynetics.com (Clay Olbon II)
Message Hash: 2c4fe2a34c584eaa436d3500c6b96c1281cfb5826c4c3da04bd57f5fdb3d125c
Message ID: <199605291844.NAA10130@homeport.org>
Reply To: <v01540b02add1fc6e4658@[193.239.225.200]>
UTC Datetime: 1996-05-29 23:24:21 UTC
Raw Date: Thu, 30 May 1996 07:24:21 +0800

Raw message

From: Adam Shostack <adam@lighthouse.homeport.org>
Date: Thu, 30 May 1996 07:24:21 +0800
To: Clay.Olbon@dynetics.com (Clay Olbon II)
Subject: Re: Statistical analysis of anonymous databases
In-Reply-To: <v01540b02add1fc6e4658@[193.239.225.200]>
Message-ID: <199605291844.NAA10130@homeport.org>
MIME-Version: 1.0
Content-Type: text



One solution to this is to have a database that 'generalizes' its
answers as it provides them.  For example, rather than returning 

Clay Olbon, 32, m, left handed, cholesterol 350, bp 200/160, 5'9", 175#, 
it would return:
fooblat martin,25-35, m, left handed, cholest. 3-400, 5.5-6ft, heavy.

researchers could then provide ranges to get answers.  Thus, if I'm
very concerned about the correlation between age and weight, I could
get that information very specifically and nothing else.

The generalization filter could be written to only allow N queries of
a given level of detail, so that the more detail you wanted in one
area, the more you give up in others.

There could be a review comittee (This is the way hospitals & medical
research works) to review requests for more specific data.

Doctors like having names, so you could genrate arbitrary names for
patients, or use a sylable genarator to come up with pronounceable
nonsense.


Adam

Clay Olbon II wrote:

| In medical research (this particular application - there are others I am
| sure) it is desirable to have a large database of individual medical
| histories available to search for correlations, risk factors, etc.  The
| problem, of course, is that many individuals want their medical histories
| kept private.  It is therefore necessary to maintain a database that is not
| traceable back to individuals.  An additional requirement is that people
| must be able to add additional information to their records as it becomes
| available.  The researcher who initially posed the question suggested
| adding random data to "encrypt anonymity".
| 

-- 
"It is seldom that liberty of any kind is lost all at once."
					               -Hume






Thread