1998-08-06 - Re: text analysis

Header Data

From: Mok-Kong Shen <mok-kong.shen@stud.uni-muenchen.de>
To: cypherpunks@toad.com
Message Hash: add19c61504a48a623319a6c2312b23fc9bd2b112e125f05eafcf3cbd45403d7
Message ID: <35C99712.A1572FA0@stud.uni-muenchen.de>
Reply To: <Pine.LNX.3.96.980806105040.20223u-100000@freenet.bishkek.su>
UTC Datetime: 1998-08-06 11:44:34 UTC
Raw Date: Thu, 6 Aug 1998 04:44:34 -0700 (PDT)

Raw message

From: Mok-Kong Shen <mok-kong.shen@stud.uni-muenchen.de>
Date: Thu, 6 Aug 1998 04:44:34 -0700 (PDT)
To: cypherpunks@toad.com
Subject: Re: text analysis
In-Reply-To: <Pine.LNX.3.96.980806105040.20223u-100000@freenet.bishkek.su>
Message-ID: <35C99712.A1572FA0@stud.uni-muenchen.de>
MIME-Version: 1.0
Content-Type: text/plain

CyberPsychotic wrote:

> text). Anyways, when things come to 2 characters set, i have to get 1024
> character set, and so on, which looks quite unreasonable to me to allocate
> memory for elements, which probably will be never found in text... I was
> thinking of other solution and came to two way connected lists (correct
> term?)  things, i.e. : i have some structure like:
> struct element {
> char value[ELEMENT_LENGTH];
> unsigned int frequency;
> struct element *previous;
> struct element *next;
> }
>  and could dinamically allocate memory for each new found element, but
> this would slow down whole code by the time list of new elements grow up.

I think currently memory is cheap enough so that you could do
frequency counts of at least trigrams with one dimensional array.

M. K. Shen