fravia_letter
HOW TO SEARCH THE WEB
by fravia+
~
Letter 007 - (March - November) 1997

__The W3gate (image fetching)__
Search engines battles
Common errors
and also How to evalute the results of a search
Fetching sites and images: the W3gate

	A very interesting possibility is offered by the fantastic W3gate, 

a German server (how comes that German FTPservices are so developed?) 

that allows you INCREDIBLE sniffing on the WWW.

	Try for instance to send it following email and you'll at once 

(well, as soon as you get the answer, say half an hour) understand what 

I mean:



To:	  w3mail@gmd.de

Subject:  nothing here

Text:	  get -a -img -l http://fravia.org/index.html 



This IS the web-fetcher for all those that have slow connexions or that 

have been 'banned' from the Web for whatever reason.



If you need more info about the W3gate, just send an "help" message to the 

same address as above:w3mail@gmd.de


__Search engines battles and spiders__

Each search engine uses a "crawler" or "spider" agent to gather web pages. Most have nicknames. You can tell if you have been visited by a crawler by checking your logs and looking for the various names which are often part of the crawler's host name.

Do not believe that the more well known search engines are 

also the best ones... alliances (and money) play unfortunately 

a huge role in these matters, for example, Infoseek strong tie 

to Netscape guarantees that many people use the service, The 

world wide web worm has no netscape tie and no major commercial

backing, so fewer people use it.



AltaVista partnered with Yahoo in June 1996, becoming the

"preferred" search engine (see below). Altavista is very

vulnerable to spammers because of its near real-time indexing.

This makes it easy for slightly different variations of the same 

page to be submitted in an attempt to block others from the

top ten. ROBOT NAME: SCOOTER



Excite was launched in late 1995 and grew quickly, eating

its competitors. In July 1996, Excite purchased the Magellan

search engine and directory. In november 1996, it acquired

Webcrawler, however Magellan and Webcrawler have not yet

been merged with Excite (eventually Magellan will: on January 22 

Webcrawler took over Magellan's top spot on the Netscape 

search page, where Excite has also a spot, giving it two

of the five top slots). ROBOT NAME: ARCHITEXT



HotBot was launched in May 1996 and represents Wired's entry

into the search engines competition. The site is powered by

the Inktomi search engine, but that does not mean that it is

the same as the UC Berkeley Inktomi catalog, it just uses the

same technology that created that catalog. ROBOT NAME: SPIDER



InfoSeek, around since early 1995, is well known and well

connected. In fall 1996 the new 'Ultrasmart / Ultraseek' 

index (the commercial idiots always choose awful stupid 

names), with 50 million URLs was introduced. Ultraseek is

the same as Ultrasmart, plus some additional information

on the found sites. ROBOT NAME: SLURP THE WEB



Lykos, around since May 1994, is one of the oldest search 

engines. Was the FIRST engine to combat attempt to spam 

in may 1996. ROBOT NAME: HOUND



Open Text, is an index that has been around since early 1995,

and until June 1996 was Yahoo's preferred search engine partner.

It's a search engine "in decline". ROBOT NAME: xxx



Webcrawler opened to the public on April 1994, and started as a

research project at the university of Washington. Purchased by

AOL in March 1995, which used it as preferred service until

November 1996, when Excite, a Webcrawler competitor, acquired

the service. ROBOT NAME: SPIDEY



Yahoo is around since late 1994, may be the oldest major web site

directory. It is a directory (not a search engine) based on

user submission. If a search of Yahoo's catalog doesn't fish,

users should then consult a search engine, Yahoo pipes the

query to any of the major search engines with a click. There

are so many people using Yahoo that the search engines listed

FIRST on Yahoo page have a strategic advantage over others. Alta

Vista is its preferred search engine.



Since Netscape navigator is the browser that people use, and since

browser have a search button that connect to a pre-defined page,

and since people are idiots that would not know how to change

such a setting even if you would explain it to them (of course you

have YOUR OWN search engine page on YOUR HARDDISK connected to

that button, if you do not be ashamed and copy at once my

searengi.htm on your harddisk, you'll later modify it as you

fancy) the page connected there IS important. Millions push

that button daily... search engines and directories had to

pay Netscape 5 million dollars each to have a top spot on that

page. AOL directs its suckers to Excite (strategic partner) and 

Webcrawler (formerly-owned); Compuserve sends its suckers to

Lykos.


 
__Common errors__

[ERROR 400]

YOUR REQUEST COULD NOT BE UNDERSTOOD BY THE SERVER

Either your browser is malfunctioning or your Internet 

connection is unreliable



[ERROR 401]

YOU ARE UNAUTHORIZED TO ACCESS THAT DOCUMENT/WEBSITE

proper authentication is required, ask root organisation



[ERRORS 403, 404, 505]

ACCESS TO THAT DOCUMENT/WEBSITE IS FORBIDDEN

Check the URL you typed (punctuation AND capitalisation)

Slashes MUST be forward-facing (/)

Contact the site maintainers



How to evalute the results of a search

This is usually the hardest and most time-consuming part of a search. The number of hits you obtain can range from none to hundreds of thousands, and their relevance or usefulness can vary from considerable to negligible. There are some things you can do to help produce more relevant hits for the fewest total number.

Success in any particular search query is usually more a question of which search tool has the best database for the subject and how the information is organized for retrieval. This is why it is often necessary to try a number of different search tools when searching for obscure information.

Some search engines list the hits by titles, some by brief text and some give you a choice. When available choose the brief text, as it is easier to evaluate. Even so, it is often necessary to click the link to see the entire document before you can assess its content. Some sites may not be of apparent interest, but will contain links that have great relevancy. Some searches yield the desired information quickly, and some you may just have to plod through. Another problem is caused by search ngines that DO NOT list the DATES of the retrieved pages.
This is VERY BAD, because the 'volatility' of Internet will have probably caused the disappearence of many of those sites (I for instance don't even bother to check pages with a 'fetch-date' older than three months when I am confronted with many hits)

As you gain experience, you will find the search tools to use that are most appropriate for your particular interests and how best to evaluate the hits.

Go ahead, enjoy!

fravia+, February-November 1997


how to search 5 how to search 6 how to search 8
homepage links +ORC tools cocktails search_forms mail_FraVia

FraVia 20 Feb 1997