Contents
Home
Where do I start?
Essential skills
About the Internet
Surfing safely
Search engines
Finding it
Task List
Links
Internet Top 10

Refining a search


Search Engines

AltaVista
HotBot
Excite
Google
Infoseek
Lycos
Northern Light
Oingo
WebCrawler

Metasearch
Dataware
Dogpile
Inference Find
Ixquick
MetaCrawler
Simpli

Directories
Argus
Clearinghouse

Open Directory
Yahoo
About
Looksmart
Internet Public
Library

Infomine
WWW Virtual
Library

Using Search Engines Efficiently

Acquiring skill in using search engines is the key to efficient use of the Internet. Search engines send out programs called spiders or crawlers (also robots, worms) to index web pages. Indexing is the extraction of  certain features about the web page such as the title or keywords. They hold the results on databases and provide a means for you to search the database. Search interfaces are usually composed of one or several input boxes (where you type in text), perhaps with additional options such as choosing the language or dates of the web pages found.

When you perform a search the engine is searching its own database, and not the whole web. Hence, the scope of your search is limited by the number of web pages that the search engine has already indexed (size matters!).

Another crucial factor is how the search engine sorts the results to present them to you. You are only likely to look at the first 20 - 50 hits that you get, even if the search engine finds several thousand hits. How the search engine decides which are the most important hits to show you first (usually called ranking or relevance) is therefore of immense importance.

Indexing

The three main types of index held on search engines are: -

Keyword

Search engines try to discover the most important words on a web page by several means. Some may count the occurrences of certain words, and others rely on information that the web page author puts in the <Head> section of a web page (a section of the web page which is not normally displayed in your browser but which contains information about the web page itself). Robot or spider programs are used by the search engines to roam the Internet and extract the keywords for indexing.

Excite and Lycos are keyword indexed.

Keyword indexing is of greatest value in conceptual (general or broad topic) searches, as it reduces the number of false hits.

Full Text

A full text index holds every word on the web page in the search engine database. Again, robot or spider programs do all of the data extraction and indexing.

Altavista, Fast, HotBot, Go, Google, and Northern Light are all full text search engines.

Full text indexing can give very comprehensive searches, at the expense of a large number of false hits. They are of great value where you want to find references to specific names or terms.

On the other hand conceptual searches often are not very successful (too many hits) - if you still wish to try a broad topic search in these engines, then I would recommend using: -

* Northern Light - organises results into folders
* Google - ranks search results according to the number of links to the web page and has a 'find similar pages facility'
* Go - has a 'find similar pages facility'

Human

Human beings (rather than computer programs) read the page and designate key words/phrases to categorise it. This type of indexing is the distinguishing feature of the directory-based web databases.

Yahoo is the classic Internet human-indexed database, but a recent excellent innovation is the Open Directory Project. You can access the latter directly, or alternatively you can try an interesting search engine called Oingo, which indexes the pages of the Open Directory, but which uses a 'meaning-based' search. When you enter a search string Oingo will also search for words that it considers to belong to the same 'semantic space' and hence, hopefully will search for the concept that you want, rather than just the words you enter.

If you can find your topic in a human-indexed search engine you will normally get off to a good start, particularly with conceptual searches. The disadvantage of directory-based sites is that the indexing is time-consuming, and therefore the databases tend to be smaller, and updated less frequently.


The boundaries between search engines and directories are becoming blurred as the service providers try to provide the ultimate one-stop information shop. There is a directory service at AltaVista (shared with LookSmart) and Lycos is now offering the Open Directory. Go offers a comprehensive directory service, which Infoseek also utilises.


Using the Appropriate Tool

Probably the most fundamental concept in searching the Internet is to understand that the type of question you are asking determines the type of searching strategy you should employ. All of us are used to doing this in our every day lives, without even realising we are employing a search strategy. For example: -

Question

Answer

I want to know the meaning of a word

Use a dictionary

I want to know the dosage regime of a drug

Look it up in a drug formulary

I want to know the diagnostic features of a disease

Look it up in a medical textbook
I want to know more about the Internet Borrow a book from the library

Once you realise that the different search engines and directories on the Internet have different strengths and weaknesses you realise the importance of knowing what type of question you are asking. It is quite useful to imagine that you are asking an actual person the question, and imagining how the answer would be formulated.

Type of question

Example Description Imaginary person's reply

Concepts

How do I use the Internet?

Broad topic involving multiple specific topics

A wide ranging discussion involving several areas.
'You could write a book on this'

Specific topics

How do I use Boolean logic in constructing my searches?

I want to find this file/program on the Internet

Circumscribed area

Brief account.

'Have a look at this A4   information sheet'

'Download it from this site'

Facts

How many web pages does Altavista index Data 'More than 300 million'


Although the three categories are not mutually exclusive, and not all questions are easily classified into one of the categories, I nevertheless find this a useful classification. The types of resource you should turn to are listed below.

* Concepts First port of call should be a directory-based resource such as Yahoo, Open Directory Project (possibly via Oingo) or About. The Argus Clearinghouse and Infomine are academically orientated. The Librarians' Index to the Internet, the Internet Public Library and the WWW Virtual Library fall somewhere in between the popular and the academic poles.

Directories may often list an entire site devoted to your topic (rather than the individual web pages thrown up by search engines, and scattered throughout your search results).

Oingo is an interesting search engine as it attempts to decide the meaning of your question, rather than just looking for the words you input. It may, therefore, generate hits for words that you did not even enter (but which it thinks are synonymous). This may be valuable if you are doing a concept search.

One resource which it is worth remembering is the old-fashioned encyclopaedia. The Encyclopaedia Britannica is available on-line (with excellent coverage in arts, history, geography and science) and provides both its own content and subject sites reviewed by the Britannica experts.

If you have no luck then try keyword-indexed search engines such as Excite and Lycos. In addition, you can use engines such as Google for its 'similar pages' facility, and Northern Light or Inference Find for their folders classification.

 

hr.gif (186 bytes)

 

* Specific topics The full text indexed search engines (and the metasearch engines) come into their own here.

Metasearch engines do not maintain their own databases, but pass your search string onto the major search engines, and then collate and organise the answers. Metasearch engines which do the job of ranking the search results and eliminating the duplicates are the best to use, which is why I recommend Ixquick, Dataware and Inference Find.

If your topic is narrow or obscure (in other words you doubt that there will be many web pages devoted to it on the Internet) then it is clear that you would want to search the text of every web page on the internet - use a full text indexed database. Metasearch engines will search several other engines for you, and can be valuable if you find very few hits using an ordinary search engine.

If you want to find a particular file on the Internet then use a full text indexed engine (NB if you are looking for a particular computer program Lycos has an engine dedicated to this purpose)

My current favourite search engines are Altavista, Google, and Northern Light, and of the metasearch engines Ixquick, Dataware and Inference Find.

 

hr.gif (186 bytes)

 

* Facts Ask Jeeves! is a favourite when it comes to asking a factual question, because the interface has been designed to cope with users who ask a question in plain language. The answers are grouped into several categories which relate to the questions that Ask Jeeves! think you are trying to ask.

If you have no luck then it is best to try the 'Specific Topic' search engines (Altavista, Google,   Northern Light, Dataware, Ixquick, and Inference Find).

One tip is to try a phrase search such as 'web pages indexed' where you enclose your search string in apostrophes to force the search engine to look for all of the words occurring together, rather than finding them separately.

The facts you may be looking for may be held on databases within websites - these are not indexed by search engines. Some people call this the Invisible Web. Finding these databases can be a bit tricky, and oddly enough a directory-based engine may give the best results (because a human being has spotted that a useful database is being held on a particular website).

The InvisibleWeb is a website devoted to providing links to these on-line databases (and currently links to about 10,000). Other sites which provide links to these databases are Lycos Searchable Databases, Search IQ's Subject Directories, and The BigHub.com.


Conducting a Search

Much of the advice that you will come across on how to search will initially appear to be too complex and technical (articles about Boolean logic etc.) Following four simple rules will get you a long way towards what you want (there is no need to follow all four rules all of the time, as following Rule 1 will often get you what you want straight away).

* Rule 1: Be as specific as you can
* Rule 2: Know which search engines/directories are best suited to your question
* Rule 3: Perform your search in several search engines simultaneously
* Rule 4: Log off and analyse your search results before clicking on the links

 

1

Launch a text editor or word processor and type in a number of words which relate to your search.  Doing this will: -

* Focus your mind on exactly what your question is
* Enable you to copy and paste the words (correctly spelt) into several search engine inputs in quick succession
* Allow you to construct several different versions of your search string in the text editor, and remember which strings you have used before.

Be as specific as you can.

For example, if you want to know about advances in the treatment of Parkinson's disease then the keywords might be 'Parkinson's disease', treatment, advances, drugs. Note the use of quote marks to make Parkinson's disease into a phrase.

If you want to find pages that have all the words you enter then use the + symbol e.g. +Parkinson's +disease +treatment

If you want to exclude specific words, you can put a minus symbol in front of them.

2

Decide on the type of question you are asking and use the search engines/directories which are best suited to your task.

See 'Using the Appropriate Tool' above.

3

Perform your search in several search engines simultaneously

Open up three or four of your favourite search engines in separate windows with your browser (<Ctrl N> is the quick way to open new windows with Internet Explorer).

If you have used your search engines recently this may even be done off-line, as the search pages will still be cached on your hard drive. You can then enter the search string before pressing the Search button to go on line.

Copy and paste the keywords into each of the search engine input boxes and perform the search.

4

Once the search has been run on each of the search engines log off the Internet and analyse the search results.

The only clues available to you prior to clicking on the hyperlink are the title of the web page, the page summary (usually written by the page author) and the web page address (the URL). Assessing the first two with respect to your search should be fairly self evident.

In addition you should become accustomed to quickly assessing the URL to decide on the likely nature of the author. read the section on URL's in About the Internet.

When you have decided on which links are promising then click on them and log on again. If your search is not looking too promising at this stage, now may be the time to look at Refining your Search

 
back

next

Further Reading

A Helpful Guide to Web Search Engines -- How Search Engines Work

The Spider's Apprentice site is a useful source of information about searching the Internet

Ask Scott

This is a superb site covering all aspects of searching, and extending many aspects of this tutorial. If you are keen to improve your searching this site is a gold mine.

Raouf Allim
22 Benjamin Road
High Wycombe
Bucks. HP13 6SR
raouf@wycombe.com
22nd June 2000