1993-10-29 - ID of anonymous posters via word analysis?

Header Data

From: gtoal@an-teallach.com (Graham Toal)
To: cypherpunks@toad.com
Message Hash: 094ed1d37e0d2c414f4762badfcbef4e3b3022bac8ea455ab2cccdd833fbd57b
Message ID: <4477@an-teallach.com>
Reply To: N/A
UTC Datetime: 1993-10-29 20:59:08 UTC
Raw Date: Fri, 29 Oct 93 13:59:08 PDT

Raw message

From: gtoal@an-teallach.com (Graham Toal)
Date: Fri, 29 Oct 93 13:59:08 PDT
To: cypherpunks@toad.com
Subject: ID of anonymous posters via word analysis?
Message-ID: <4477@an-teallach.com>
MIME-Version: 1.0
Content-Type: text/plain


In article <Pine.3.87.9310291032.A24998-0100000@crl.crl.com> arthurc@crl.crl.com writes:
 >   I think that identification by buzzwords, habitual misspellings, etc. 
 > could be used to identify anonymous posters. Sentence structure is also 
 > revealing. Le style, c'est l'homme, said Voltaire.  Of course, it all 
 > comes down to how much time and effort you want to put into proving, say, 
 > that SBoxx=LDetweiler.

I had a go at this just for fun when an8785 was doing his thing.  I'm
pretty sure I identified him correctly in the end.  (The guy I thought
it was, when I asked him, said 'If I were I wouldn't tell you', whereas
all the other people I suspected but not as strongly all denied it
violently, heh heh heh)

I think this sort of analysis could be automated to a reasonable
extent, to cut out the TypeI errors that the guys who did Shakespeare/Bacon
analysis made.  It's very easy to fool yourself if you don't have predefined
criteria of comparison and a rigid marking scheme.

I'm fairly sure that a sufficiently detailed analysis looking at enough
different points of style would still catch someone's fingerprint even if
they went out of their way to disguise their postings.  The only approach
I can think of that would be successful in hiding individual style is for
person A to write something, person B reads it quickly, then attempts to
write something with the same semantic content, but of course it will
have B's grammar and phraseology and punctuation idiosyncracies.  (And
this only works if B is not a net poster, otherwise you recognise B and
work out who his friends are :-) )

G
-- 
Personal mail to gtoal@gtoal.com (I read it in the evenings)
Business mail to gtoal@an-teallach.com (Be careful with the spelling!)
Faxes to An Teallach Limited: +44 31 662 4678  Voice: +44 31 668 1550 x212






Thread