1997-01-04 - Re: OCR and Machine Readable Text

Header Data

From: Kent Crispin <kent@songbird.com>
To: panther@iglou.com
Message Hash: 1a7d5edc9d69f6adf0a7a7efaabdbd570ec2ee6a668bc87cfd0b41fb737f63dc
Message ID: <199701040726.XAA17982@songbird.com>
Reply To: <32CD435E.24DC@iglou.com>
UTC Datetime: 1997-01-04 06:23:21 UTC
Raw Date: Fri, 3 Jan 1997 22:23:21 -0800 (PST)

Raw message

From: Kent Crispin <kent@songbird.com>
Date: Fri, 3 Jan 1997 22:23:21 -0800 (PST)
To: panther@iglou.com
Subject: Re: OCR and Machine Readable Text
In-Reply-To: <32CD435E.24DC@iglou.com>
Message-ID: <199701040726.XAA17982@songbird.com>
MIME-Version: 1.0
Content-Type: text


/**\\anonymous/**\\ allegedly said:
> 
> Alan Olsen wrote:
> > I used to work for a company that would transfer entire archives of medical
> > journals.  Much of it we would just OCR.  Some of it we would send off
> > shore.  The OCR software was about 95% reliable and this was over 5 years
> > ago.  (And we were using 286 boxes for much of the OCR work.  Not a heavy
> > technoligical investment.)  I am sure that things have improved a great
> > deal since then.  (My new scanner included OCR software.  I will have to
> > run a test and report the findings.
> 
> 	I'd like to know what OCR software you were using.  All tests we
> completed at my place of employment were very poor quality wise.  We
> showed
> a %65 accuracy rate.  Not very good when you need to transfer a five
> year
> backlog of medical and technical journals.  This was using a high
> resolution
> scanner with a package that was bundled along with it.  About a year
> ago,
> my employer considered transfering data taken off of forms into a
> relational
> database using an OCR program.  Again, we found the findings to be too
> innacurate for our needs.  I may have just been using the wrong programs
> for
> the job, but the findings were depressing...

My understanding is that the most efficient way of inputting text is 
"double typing" where two people type the same document, and a 
mechanical comparison of the result is used to find errors.  

-- 
Kent Crispin				"No reason to get excited",
kent@songbird.com,kc@llnl.gov		the thief he kindly spoke...
PGP fingerprint:   5A 16 DA 04 31 33 40 1E  87 DA 29 02 97 A3 46 2F





Thread