1997-04-28 - Textual Analysis of Spam

Header Data

From: Mark Grant <mark@unicorn.com>
To: cypherpunks@toad.com
Message Hash: 17ff19638bde2fbf735c2a284b227afa071646c9d58845e97157334d0e837173
Message ID: <Pine.SOL.3.96.970428023215.21960A-100000@sirius.infonex.com>
Reply To: N/A
UTC Datetime: 1997-04-28 09:41:49 UTC
Raw Date: Mon, 28 Apr 1997 02:41:49 -0700 (PDT)

Raw message

From: Mark Grant <mark@unicorn.com>
Date: Mon, 28 Apr 1997 02:41:49 -0700 (PDT)
To: cypherpunks@toad.com
Subject: Textual Analysis of Spam
Message-ID: <Pine.SOL.3.96.970428023215.21960A-100000@sirius.infonex.com>
MIME-Version: 1.0
Content-Type: text/plain

I'm currently writing a spam-filter and bouncing messages based on header
information and keyword counts. This gets about 80% of the spam, but also
gives about 1% false positives. Has anyone done any more sophisticated
analysis of spam text than this? I can't find anything on the Web.