1996-09-23 - possible solution to cyber S/N

Header Data

From: “Vladimir Z. Nuri” <vznuri@netcom.com>
To: cypherpunks@toad.com
Message Hash: 3d544115bf6b2adf2817e55f2fd59939d8df40defc136579259ee0aa1c6852a4
Message ID: <199609231847.LAA29832@netcom5.netcom.com>
Reply To: N/A
UTC Datetime: 1996-09-23 22:42:01 UTC
Raw Date: Tue, 24 Sep 1996 06:42:01 +0800

Raw message

From: "Vladimir Z. Nuri" <vznuri@netcom.com>
Date: Tue, 24 Sep 1996 06:42:01 +0800
To: cypherpunks@toad.com
Subject: possible solution to cyber S/N
Message-ID: <199609231847.LAA29832@netcom5.netcom.com>
MIME-Version: 1.0
Content-Type: text/plain



I've written many times on the problem of signal to noise
here in cyberspace and have tried to think through some
elegant solutions to this very vexing problem.

moments ago I just came up with an intense brainstorm
fully worthy of sharing with the list <g>.


it's clear that the web functions very much like a directed
graph in many ways. sites and traffic between them are one
graph. however, another graph involves thinking of each
page or "article" on the web as a single node in the graph,
and hyperlinks as the edges between the nodes.

what we have today is a directed graph in which one can
follow the forward direction of hyperlinks very readily.
however, it is much more difficult to follow the reverse
direction, i.e. find "all the sites that reference this paper".
of course this is part of the beauty of the web, that anyone
can link to anyone without actually having to register or
something like that.

the "referrer log" feature on a server is a mechanism that
does allow a server to get some idea of who is linked
up to that site.


the basic idea I have is that references are an excellent
way of discriminating Signal to Noise and are used routinely
in the scientific arena. a paper that is pivotal and influential
is referred to ad infinitum. obscure papers are forgotten and
never referred to in subsequent literature.

taking this idea to the cyberspace arena, the application is
immediately obvious-- pages that are linked to by a lot of 
other pages are valuable, those that are not are not as
valuable.

(another closely related idea is how much hit traffic a web site
gets-- wouldn't it be a *tremendous* improvement in the current
search engines if they returned the pages ranked according to how
many hits they get per time?)

there are some problems with all the above, however. currently
only the site that actually houses the pages can keep track 
of hits, and referrers are not very well kept track of at
all. in a robust system, cheating, such as reporting more hits
than one is getting, would not be possible.

anyway, I tend to think that future S/N problems in cyberspace
are increasingly going to be solved in some particular ways
that are just now being tried out:

1. rating agencies. agencies that both reject and find "cool stuff"
(like Point Communications etc.)

2. hit statistics. how many people are hitting various pages? if
we could get an arbitron-like system that works the same way that
newsgroup readerships are now reported (each site compiles statistics
and sends them to centralized databases) we could have search
engines that rank results according to hit statistics. I'm not
saying it would be perfect, and there are all kinds of obvious
nitpicky things that people here will harp on, but I still insist
it would be better than no statistics at all.

eventually a system like this that involved voluntary submissions of
hit counts to the centralized servers (or just some way of search
engines to get hit statistics from servers about the pages they
own/serve) may evolve into a more robust system that makes cheating
impossible (such as falsely reporting a high hit count to attract
people to the site).

3. linking statistics. again, I think this is an extremely powerful
way of separating signal from noise-- how many other sites are
linked to the page in question? if it has few links, it isn't
as interesting, if a lot of other people point to it, it's far more
interesting.


I encourage system designers to keep some of these ideas in mind
when they are working on the current generation of software tools
such as search engines etc.  in particular there is a tremendous amount
of innovation going on right now between search engine designers, and
adopting some of the above ideas into a search engine might be a very
powerful way of distinguishing it from the "competition" by returning
more useable search results to the end user.

again, I suspect we will be seeing increasingly ingenious and efficacious
ways of dealing with what today is the horrible S/N problem. perhaps
today will be thought of as the dark age of cyberspace because of
all the muck we are routinely wading through <g>







Thread