From: Mok-Kong Shen <mok-kong.shen@stud.uni-muenchen.de>
To: cypherpunks@toad.com
Message Hash: add19c61504a48a623319a6c2312b23fc9bd2b112e125f05eafcf3cbd45403d7
Message ID: <35C99712.A1572FA0@stud.uni-muenchen.de>
Reply To: <Pine.LNX.3.96.980806105040.20223u-100000@freenet.bishkek.su>
UTC Datetime: 1998-08-06 11:44:34 UTC
Raw Date: Thu, 6 Aug 1998 04:44:34 -0700 (PDT)
From: Mok-Kong Shen <mok-kong.shen@stud.uni-muenchen.de>
Date: Thu, 6 Aug 1998 04:44:34 -0700 (PDT)
To: cypherpunks@toad.com
Subject: Re: text analysis
In-Reply-To: <Pine.LNX.3.96.980806105040.20223u-100000@freenet.bishkek.su>
Message-ID: <35C99712.A1572FA0@stud.uni-muenchen.de>
MIME-Version: 1.0
Content-Type: text/plain
CyberPsychotic wrote:
> text). Anyways, when things come to 2 characters set, i have to get 1024
> character set, and so on, which looks quite unreasonable to me to allocate
> memory for elements, which probably will be never found in text... I was
> thinking of other solution and came to two way connected lists (correct
> term?) things, i.e. : i have some structure like:
>
> struct element {
> char value[ELEMENT_LENGTH];
> unsigned int frequency;
> struct element *previous;
> struct element *next;
> }
> and could dinamically allocate memory for each new found element, but
> this would slow down whole code by the time list of new elements grow up.
I think currently memory is cheap enough so that you could do
frequency counts of at least trigrams with one dimensional array.
M. K. Shen
Return to August 1998
Return to “Mok-Kong Shen <mok-kong.shen@stud.uni-muenchen.de>”