1996-01-16 - Re: Spiderspace

Header Data

From: m5@dev.tivoli.com (Mike McNally)
To: ecarp@tssun5.dsccc.com (Ed Carp @ TSSUN5)
Message Hash: ff5e91bcaceceb8cbd0d6e000ff13a488455ce6733b437aa957d2430037b34f1
Message ID: <9601161922.AA13227@alpha>
Reply To: <9601161853.AA13284@tssun5.>
UTC Datetime: 1996-01-16 21:10:51 UTC
Raw Date: Wed, 17 Jan 1996 05:10:51 +0800

Raw message

From: m5@dev.tivoli.com (Mike McNally)
Date: Wed, 17 Jan 1996 05:10:51 +0800
To: ecarp@tssun5.dsccc.com (Ed Carp @ TSSUN5)
Subject: Re: Spiderspace
In-Reply-To: <9601161853.AA13284@tssun5.>
Message-ID: <9601161922.AA13227@alpha>
MIME-Version: 1.0
Content-Type: text/plain



Ed Carp writes:
 > ... I was under the impression that the only documents that most web crawlers
 > will search are documents that are link-accessible.  Are you saying that this
 > isn't true?  Are you saying that Alta-Vista will search EVERYTHING that's
 > publicly accessible, whether by anonymous FTP or web?

Ah, but if it hits a site that's set up with a top-level directory
which *does* contain an "index" page but whose server *doesn't*
recognize the index page name, then when you hit the site you
(probably) get one of those server-generated indices.  Those things
generally have *everything* in the directory visible (except those
files blocked by the server configuration, usually stuff like emacs
temp files), and so there you go...

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Nobody's going to listen to you if you just | Mike McNally (m5@tivoli.com) |
| stand there and flap your arms like a fish. | Tivoli Systems, Austin TX    |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





Thread