1996-11-23 - Re: how much entropy in common answers

Header Data

From: Pat Farrell <pfarrell@netcom.com>
To: Hal Finney <pfarrell@netcom.com
Message Hash: 3aa328cefa1183e63e425a331c5e4176fd09fad48ffdcfae684da39978d080e0
Message ID: <199611232225.OAA15379@netcom5.netcom.com>
Reply To: N/A
UTC Datetime: 1996-11-23 22:25:39 UTC
Raw Date: Sat, 23 Nov 1996 14:25:39 -0800 (PST)

Raw message

From: Pat Farrell <pfarrell@netcom.com>
Date: Sat, 23 Nov 1996 14:25:39 -0800 (PST)
To: Hal Finney <pfarrell@netcom.com
Subject: Re: how much entropy in common answers
Message-ID: <199611232225.OAA15379@netcom5.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain


At 01:26 PM 11/23/96 -0800, Hal Finney wrote:
>From: Pat Farrell <pfarrell@netcom.com>
>> Clearly there are cultural issues involved. The entropy in a question
>> such as "what is your favorite brother's name?" is low in an Irish
>> family like mine where names cluster arround choices such as are Patrick,
>> John, Sean, and Dan.
>> So how do we measure the entropy objectively?
>
>You have to estimate the probability that the attacker will guess what you
>have chosen.  This will depend on how much the attacker knows about you.
>If he knows that you're Irish, it will help in the question above.  If he
>knows the names of your brothers, it will help a lot more.  Probably
>it is best to be conservative in assuming what your attacker knows.

I was really hoping for some insight into the general problem.
If you knew that my family is Irish, that makes certain names
much more likely. Obviously if you know that I've got five brothers,
a little bit of work will probably let you know that they are Tom, Dick,
Harry, Mike, and John. But that is an example of a terrible question for Carl's
approach. I was asking the more general question.

Carl suggested that in general a first name has about eight bits of entropy.
But knowledge of the social environment can seriously reduce it. Jenifer was
a hugely popular name for girls in the US ten to 20 years ago. You'd
expect more Juan's and Jose's in a Hispanic community, just like you'd expect
the Dan's, Pat's, Mike's, in an Irish community.

I know that the classic definition of entropy is, but without knowledge of
the statistical universe that we're dealing with, how can I measure it?
The probability that a male's first name is Harry is probably pretty low
in general, yet it is exactly 20% if you restrict the world to my brothers.

Carl suggested "What was the name of the first person on whom I had a crush?"
But if 33% of the women are named "Maria" in the local universe, then
that is not much entropy. Yet a name of "Maria McGee" is probably 
fairly high entropy, as it is an unlikely combination. If you were raised
in a small rural area, there might not be all that many possible
answers to Carl's question.

>If you have four brothers and nobody whom the attacker could ask will
>know who is your favorite, but you think he could find out there names,
>then he has probably a 1/4 chance of guessing right.  (Actually he
>might do better by preferring older brothers rather than younger, etc.)

This is exactly the type of local social bias that I want to measure.
We would expect that an older brother could be a role model, etc. and thus
be more likely to be the "favorite"

How do I know when I've got Carl's 90 bits of entropy?

Pat

Pat Farrell    CyberCash, Inc. 			(703) 715-7834
pfarrell@cybercash.com
#include standard.disclaimer





Thread