1993-03-03 - ANON: Textual analysis

Header Data

From: jthomas@mango.mitre.org (Joe Thomas)
To: yanek@novavax.nova.edu
Message Hash: 0a0e40f81f98c807f54aa47109b8717d3e04bc9125d581a0ce8dc0da18b42201
Message ID: <9303031439.AA22164@mango>
Reply To: N/A
UTC Datetime: 1993-03-03 14:43:02 UTC
Raw Date: Wed, 3 Mar 93 06:43:02 PST

Raw message

From: jthomas@mango.mitre.org (Joe Thomas)
Date: Wed, 3 Mar 93 06:43:02 PST
To: yanek@novavax.nova.edu
Subject: ANON: Textual analysis
Message-ID: <9303031439.AA22164@mango>
MIME-Version: 1.0
Content-Type: text/plain


> > This reveals a minor and probably obvious weakness of  
pseudonyms--writing
> > styles.

>We probably need "rephrasing remailers" which do some rudimentary
>grammar parsing on input text, and randomly substitute equivalent
>constructs such as switching active/passive voice, synonyms,  
changing
>the word order where it is insignificant, joining/splitting  
sentences,
>etc.  Anyone here have any experience in NLP (natural language  
processing),
>specifically parsing english?

...

>Another starting point is language translation software.  After your  
text
>has been translated automatically to spanish -> french -> german ->  
english,
>not much of the orignal style will remain.  Hopefully, enough  
meaning
>will be preserved to allow understanding.

This whole problem looks to me to be AI-complete.  I mean, I can't  
understand the manual from my Roland synth without a whole lot of  
head-scratching, and that was translated by a human!  I don't think  
you're going to see a computer program giving intelligible rephrasing  
any time soon.  The burden of disguising writing style may continue  
to fall on the author, but if everyone has the tools to statistically  
analyze their own messages before they send them, they'll at least  
see what they need to change around before sending.  [I, for example,  
might decide to use sentences with fewer than three clauses...]

Joe





Thread