Message Hash: f975c476221f61058ac3776bc687ed63981a2dffa8bca55597d33ceaa7945d4a
Message ID: <9310141748.AA11655@alumni.cco.caltech.edu>
Reply To: N/A
UTC Datetime: 1993-10-14 17:52:16 UTC
Raw Date: Thu, 14 Oct 93 10:52:16 PDT
From: firstname.lastname@example.org Date: Thu, 14 Oct 93 10:52:16 PDT To: email@example.com Subject: Re: Generating random numbers from english text Message-ID: <9310141748.AA11655@alumni.cco.caltech.edu> MIME-Version: 1.0 Content-Type: text/plain On Thu, 14 Oct 93 09:29:16 PDT., baldwin@LAT.COM (Bob Baldwin) writes: < With the current random number discussion going on, I thought < I would point out one convenient way to generate random numbers in < the situation where you do not need to generate them frequently. I believe < that Claude Shannon (Dr. Information Theory) proposed this: < < 1. Ask a person to speak about any topic for a paragraph or two. < Instruct them to generate original sentences, not just repeats of < some passage of a written work. Write down what they say. < < 2. The english language has an entropy of 1.0 to 1.5 bits per letter, < so it is safe to extract one bit per character. If you read Shannon's < work and its modern interpretation, you will understand that this < bit per character is truely random. It is uncorrelated with all < the other bits. The reason for avoiding a pre-existing written < work (poem, article, story, etc), is to avoid a brute force search < of the source language space. can you provide a citation? i recently read a paper Shannon wrote  on a study of randomness in english text, and this is definitely not the impression i got from it. in this study Shannon demonstrates that english-speaking humans are very capable of predicting the next letter of an unknown text given the letters up to that point (he also shows this for the reverse direction; that is, starting at the end of the text and working forward, subjects were almost as capable of predicting correctly as they were starting from the front of the text). this seems to imply that there is some non-random relationship (statistical or otherwise) among the letters in a text. might this redundancy be carried over into any encoding (MD5 hash, e.g.) of the text? theoretically, at least, this may compare to Vigne're ciphers; the key is based on some text other than the plaintext. < The modern version of this algorithm, is to compute the < MD5 digest of a long text string. For a 128 bit digest, you need < at least 128 characters of source language. For a larger random number, < you can concatenate multiple MD5 values from multiple pieces of source text. could someone here describe how this compares to other pseudo-random sources? how well would this function as a seed to some other pseudo-random number generation process?  Shannon, Claude E., "Prediction and Entropy of Printed English," Bell System Technical Journal, vol. XXX, No. 1, Jan. 1951, pp. 50-64.