1993-03-03 - ANON: Textual analysis

Header Data

From: root@rmsdell.ftl.fl.us (Yanek Martinson)
To: cypherpunks@toad.com
Message Hash: be32d201d8895169a88537f05cc3016c638c5a98e317ff74a563956949bdc14b
Message ID: <m0nTk5E-0002TpC@rmsdell.ftl.fl.us>
Reply To: N/A
UTC Datetime: 1993-03-03 03:49:14 UTC
Raw Date: Tue, 2 Mar 93 19:49:14 PST

Raw message

From: root@rmsdell.ftl.fl.us (Yanek Martinson)
Date: Tue, 2 Mar 93 19:49:14 PST
To: cypherpunks@toad.com
Subject: ANON: Textual analysis
Message-ID: <m0nTk5E-0002TpC@rmsdell.ftl.fl.us>
MIME-Version: 1.0
Content-Type: text/plain

> This reveals a minor and probably obvious weakness of pseudonyms--writing
> styles.

We probably need "rephrasing remailers" which do some rudimentary
grammar parsing on input text, and randomly substitute equivalent
constructs such as switching active/passive voice, synonyms, changing
the word order where it is insignificant, joining/splitting sentences,
etc.  Anyone here have any experience in NLP (natural language processing),
specifically parsing english?

A possible start would be to look at "grammar checker" programs that 
check for various grammatical mistakes/misusages and suggest improvements.

Another starting point is language translation software.  After your text
has been translated automatically to spanish -> french -> german -> english,
not much of the orignal style will remain.  Hopefully, enough meaning
will be preserved to allow understanding.

Are there any public domain programs that do one of the above?

One constraint on these is that the message must be present in clear text,
so that it must be the last remailer in the chain.

> examination of punctuation styles (e.g., some people use _this_ for
> emphasis while others use *this*)

This could be alleviated by using a standard markup format, such as
MIME RichText, or the simpler markup convention recently proposed
on the mime list.

Yanek Martinson