1993-10-29 - Style Analysis

Header Data

From: Dark <unicorn@access.digex.net>
To: cypherpunks@toad.com
Message Hash: b631137e5df9d15561f07d24889f7ea03fd09c53a5bcd8b2ebaf53d7c8a15dad
Message ID: <199310292142.AA22002@access.digex.net>
Reply To: N/A
UTC Datetime: 1993-10-29 21:44:07 UTC
Raw Date: Fri, 29 Oct 93 14:44:07 PDT

Raw message

From: Dark <unicorn@access.digex.net>
Date: Fri, 29 Oct 93 14:44:07 PDT
To: cypherpunks@toad.com
Subject: Style Analysis
Message-ID: <199310292142.AA22002@access.digex.net>
MIME-Version: 1.0
Content-Type: text/plain


-----BEGIN PGP SIGNED MESSAGE-----
 
- -> Back when jpinson@fcdarwin.org.ec said....
 
[Stuff deleted, no value judgment implied]
 
The researchers analyzed the frequency distribution of words
found in the works of Shakespeare, and compared them to the other
writers of the day.     I don't recall the results of the
project, but that kind of research would have implications for
anonymous postings.
 
It is not too difficult to see how certain spelling errors, word
frequency (how often do you say 'I':-) choice of wording, and the
working vocabulary of an individual could  allow you to
identify an anonymous poster.  This would be particularly easy if the
individual also posted under their real name.
 
[Stuff deleted, no value judgment implied]
 
This brings up the subject of how one can post without
leaving an "ASCII fingerprint".  I suspect the use of a spelling
checker and grammatical checker would help.    Perhaps running
your text through a language converter, (say English to French)
then back would remove many identifying characteristics.
 
 
Jim Pinson                     Galapagos Islands
PGP key available by finger    jpinson@fcdarwin.org.ec
 
 
- -> to which I reply:
 
It seems to me that the software to "filter" a message through and
remove anomalies, standardize punctuations and replace words
over 5 letters with more standard words.. etc.. has a kind of
utility.  I particularly like the two sweep translation program
idea.  If enough people used this software it would become
meaningless to attempt this kind of analysis, which looks to
be straightforward enough to give even the persistent
investigator a "gut feel" for the identity of an otherwise
anonymous poster.
It seems that the most solid basis for this kind of message
analysis is non-standard use of grammar, spelling, and
punctuation. I, for example, use too many commas.
Anyone have any information on what factors identify
posters?  Is it just word frequency analysis or...?  It would
be easy enough to correct that.
 
- -uni- (Dark)
 
 
 
-----BEGIN PGP SIGNATURE-----
Version: 2.3
 
iQCVAgUBLNGcxxibHbaiMfO5AQFVIwP+JsuNvRmE1WlFZ7wxvIybg1bTa0FO5/N7
4XrHQ0On1avtoFDjPAmA7dqgrHHscz8LiwYEx1eXx/exOPmZkA2sCg5/AVo61zv6
iBjsqd3o5IgV9L+uXmzl2+OBJ0zpdTyNxiV7VzrKjJqKVlzZgCqbYCB8tN5cOpFj
M3FnGQZfSsg=
=a1Hf
-----END PGP SIGNATURE-----





Thread