1997-01-03 - Re: OCR and Machine Readable Text

Header Data

From: Hal Finney <hal@rain.org>
To: cypherpunks@toad.com
Message Hash: 071871fe1710cc8192d902f4812fe65eff00c0163de0fbdf87099e6b6285a209
Message ID: <199701031704.JAA08887@crypt.hfinney.com>
Reply To: N/A
UTC Datetime: 1997-01-03 17:06:04 UTC
Raw Date: Fri, 3 Jan 1997 09:06:04 -0800 (PST)

Raw message

From: Hal Finney <hal@rain.org>
Date: Fri, 3 Jan 1997 09:06:04 -0800 (PST)
To: cypherpunks@toad.com
Subject: Re: OCR and Machine Readable Text
Message-ID: <199701031704.JAA08887@crypt.hfinney.com>
MIME-Version: 1.0
Content-Type: text/plain


Before you get too carried away by the capabilities of off the shelf
character recognition, read Phil Karn's experience trying to produce
a working DES program from the pages of Applied Cryptography.  It was
easier than writing it himself, but it was still far from a turnkey
operation.  This is from <URL:
http://www.qualcomm.com/people/pkarn/export/karndecl.html >


> 5. I began by first photocopying, on a standard office photocopier, the
> 18 pages containing the Triple DES source code listing from Part V of
> the Book. This took about 5 minutes. Second, I scanned in the 18 sheets
> on a Macintosh Quadra 610 computer system equipped with an HP ScanJet II
> flatbed scanner and Omnipage Professional optical character recognition
> (OCR) software. The computer, scanner, and software are all readily
> available through normal consumer computer supply channels. The total
> scanning process took about one and a half hours. About an hour of this
> time was spent learning to use the scanning system and conducting trial
> runs, as I had only used it briefly some time ago. The actual scan of the
> 18 pages took about 15-20 minutes. Third, I transferred the resulting
> machine-readable file from the Macintosh to my own personal computer
> and brought it up under GNU EMACS, a popular and widely available text
> editing program that I have used for many years. In EMACS I compared,
> by eye, the scanned file displayed on my screen against the printed
> listing in the Book. I began correcting the scanner's many errors, such
> as mistaking the digit '0' for the letter 'O' or mistaking the vertical
> bar '|' for the letter 'I'.
> 
> 6. After manually correcting those errors noticed through visual
> comparison with the Book, I invoked the "C" language compiler on the
> (partially) corrected file. The compiler immediately pointed out
> additional errors I had overlooked in my visual inspection so I could
> also correct them by reference to the Book. I also noticed several errors
> in the listing printed in the Book. However, the programmer's intentions
> were obvious from the context of each error and were easily fixed. About
> fifty minutes later, I successfully compiled the file without error.
> 
> 7. The fourth step was to write a small test program to execute
> the DES code with the test vectors given at the end of the source
> code listing. This trivial program took less than 5 minutes to
> write. Unfortunately, the test did not succeed, meaning that at
> least one error went undetected by the compiler in either the code as
> printed in the Book or as scanned. Scrutinizing the code more closely,
> I quickly found another error in the printed version that was easily
> corrected. However, it still did not produce correct results. After about
> an hour of searching, I finally located the error in a list of numbers
> in a table -- another error in the printed version. By reference to the
> DES algorithm description in the first part of the Book, which includes
> the correct numbers in tabular form, I found and corrected the error.
> 
> 8. At this point the test finally succeeded, so I knew I had a correct
> program. However, to increase my confidence further I tried a few
> other DES test vectors that were not included in the source code, but
> were openly published by the US National Institute of Standards and
> Technology (NIST). All passed. At this point it was beyond doubt that I
> had a correct, working copy of the DES source code identical to that on
> the Disk with all errors removed, including those printed in the Book
> as well as those added by the scanning process.
> 
> 9. Finally, in about 40 minutes I wrote and debugged a "driver" program
> analogous to that included in the Crowell declaration. This driver program
> takes a sample plaintext file, encrypts it, displays the encrypted file
> in hexadecimal and then decrypts it.





Thread