Contents
Home
Where do I start?
Essential skills
About the Internet
Surfing safely
Search engines
Finding it
Task List
Links
Internet Top 10
Refining a search
Search Engines
AltaVista
HotBot
Excite
Google
Infoseek
Lycos
Northern Light
Oingo
WebCrawler
Metasearch
Dataware
Dogpile
Inference Find
Ixquick
MetaCrawler
SimpliDirectories
Argus
Clearinghouse
Open Directory
Yahoo
About
Looksmart
Internet Public
Library
Infomine
WWW Virtual
Library |
Using
Search Engines Efficiently
Acquiring skill in using search engines
is the key to efficient use of the Internet. Search engines send out programs called
spiders or crawlers (also robots, worms) to index web pages. Indexing is the extraction
of certain features about the web page such as the title or keywords. They hold the
results on databases and provide a means for you to search the database. Search interfaces
are usually composed of one or several input boxes (where you type in text), perhaps with
additional options such as choosing the language or dates of the web pages found.
When you perform a search the engine is
searching its own database, and not the whole web. Hence, the scope of your search is
limited by the number of web pages that the search engine has already indexed (size
matters!).
Another crucial factor is how the
search engine sorts the results to present them to you. You are only likely to look at the
first 20 - 50 hits that you get, even if the search engine finds several thousand hits.
How the search engine decides which are the most important hits to show you first (usually
called ranking or relevance) is therefore of immense importance.
Indexing
The three main types of index held on
search engines are: -
Keyword
|
Search
engines try to discover the most important words on a web page by several means. Some may
count the occurrences of certain words, and others rely on information that the web page
author puts in the <Head> section of a web page (a section of the web page which is
not normally displayed in your browser but which contains information about the web page
itself). Robot or spider programs are used by the search engines to roam the Internet and
extract the keywords for indexing. Excite and Lycos are keyword indexed.
Keyword indexing is of greatest value
in conceptual (general or broad topic) searches, as it reduces the number of false hits. |
Full Text
|
A full
text index holds every word on the web page in the search engine database. Again, robot or
spider programs do all of the data extraction and indexing. Altavista,
Fast, HotBot,
Go, Google, and Northern Light are all full text
search engines.
Full text indexing can give very
comprehensive searches, at the expense of a large number of false hits. They are of great
value where you want to find references to specific names or terms.
On the other hand conceptual searches
often are not very successful (too many hits) - if you still wish to try a broad topic
search in these engines, then I would recommend using: -
Northern Light
- organises results into folders
Google
- ranks search results according to the number of links to the web page and has a 'find
similar pages facility'
Go -
has a 'find similar pages facility'
|
Human
|
Human
beings (rather than computer programs) read the page and designate key words/phrases to
categorise it. This type of indexing is the distinguishing feature of the directory-based
web databases. Yahoo is the classic Internet human-indexed database, but
a recent excellent innovation is the Open Directory Project.
You can access the latter directly, or alternatively you can try an interesting search
engine called Oingo, which indexes the pages of the
Open Directory, but which uses a 'meaning-based' search. When you enter a search string
Oingo will also search for words that it considers to belong to the same 'semantic space'
and hence, hopefully will search for the concept that you want, rather than just the words
you enter.
If you can find your topic in a
human-indexed search engine you will normally get off to a good start, particularly with
conceptual searches. The disadvantage of directory-based sites is that the indexing is
time-consuming, and therefore the databases tend to be smaller, and updated less
frequently. |
The boundaries between search engines and directories are becoming blurred as the service
providers try to provide the ultimate one-stop information shop. There is a directory
service at AltaVista (shared with LookSmart) and Lycos
is now offering the Open Directory. Go offers a comprehensive directory service, which Infoseek also utilises.
Using the Appropriate Tool
Probably the most fundamental concept
in searching the Internet is to understand that the type of question you are asking
determines the type of searching strategy you should employ. All of us are used to doing
this in our every day lives, without even realising we are employing a search strategy.
For example: -
Question |
Answer |
I want to know the meaning of
a word |
Use a dictionary |
I want to know the dosage
regime of a drug |
Look it up in a drug formulary |
I want to know the diagnostic
features of a disease |
Look it up in a medical textbook |
I want to know more about the Internet |
Borrow a book from the library |
Once you realise that
the different search engines and directories on the Internet have different strengths and
weaknesses you realise the importance of knowing what type of question you are asking. It
is quite useful to imagine that you are asking an actual person the question, and
imagining how the answer would be formulated.
Type of question |
Example |
Description |
Imaginary person's reply |
Concepts |
How do I use the Internet? |
Broad topic involving multiple specific topics |
A wide ranging discussion involving several areas.
'You could write a book on this' |
Specific topics |
How do I use Boolean logic in constructing my
searches?
I want to find this
file/program on the Internet |
Circumscribed area |
Brief account. 'Have
a look at this A4 information sheet'
'Download it from this site' |
Facts |
How many web pages does Altavista index |
Data |
'More than 300 million' |
Although the three categories are not mutually exclusive, and not all questions are easily
classified into one of the categories, I nevertheless find this a useful classification.
The types of resource you should turn to are listed below.
 |
Concepts |
First port of call should
be a directory-based resource such as Yahoo, Open Directory Project (possibly via Oingo) or About. The Argus Clearinghouse and Infomine are academically orientated. The Librarians' Index to the Internet, the Internet Public Library and the WWW Virtual Library fall somewhere in between the popular
and the academic poles. Directories
may often list an entire site devoted to your topic (rather than the individual web pages
thrown up by search engines, and scattered throughout your search results).
Oingo
is an interesting search engine as it attempts to decide the meaning of your question,
rather than just looking for the words you input. It may, therefore, generate hits for
words that you did not even enter (but which it thinks are synonymous). This may be
valuable if you are doing a concept search.
One resource which it is worth
remembering is the old-fashioned encyclopaedia. The Encyclopaedia
Britannica is available on-line (with excellent coverage in arts, history, geography
and science) and provides both its own content and subject sites reviewed by the
Britannica experts.
If you have no luck then try
keyword-indexed search engines such as Excite
and Lycos. In addition, you
can use engines such as Google for its 'similar pages'
facility, and Northern Light or Inference Find for their folders
classification.
|
|
 |
 |
Specific
topics |
The full text indexed
search engines (and the metasearch engines) come into their own here. Metasearch engines do not maintain their own
databases, but pass your search string onto the major search engines, and then collate and
organise the answers. Metasearch engines which do the job of ranking the search results
and eliminating the duplicates are the best to use, which is why I recommend Ixquick, Dataware and Inference Find.
If your topic is narrow or obscure (in
other words you doubt that there will be many web pages devoted to it on the Internet)
then it is clear that you would want to search the text of every web page on the internet
- use a full text indexed database. Metasearch engines will search several other engines
for you, and can be valuable if you find very few hits using an ordinary search engine.
If you want to find a particular file
on the Internet then use a full text indexed engine (NB if you are looking for a
particular computer program Lycos
has an engine dedicated to this purpose)
My current favourite search engines are
Altavista, Google,
and Northern Light, and of the
metasearch engines Ixquick, Dataware and Inference Find.
|
|
 |
 |
Facts |
Ask Jeeves! is a favourite when it comes to asking a
factual question, because the interface has been designed to cope with users who ask a
question in plain language. The answers are grouped into several categories which relate
to the questions that Ask Jeeves! think you are trying to ask. If you have no luck then it is best to try the
'Specific Topic' search engines (Altavista, Google, Northern Light, Dataware, Ixquick, and Inference
Find).
One tip is to try a phrase search such
as 'web pages indexed' where you enclose your search string in apostrophes to force the
search engine to look for all of the words occurring together, rather than finding them
separately.
The facts you may be looking for may be
held on databases within websites - these are not indexed by search engines. Some people
call this the Invisible Web. Finding these databases can be a bit tricky, and oddly enough
a directory-based engine may give the best results (because a human being has spotted that
a useful database is being held on a particular website).
The
InvisibleWeb is a website devoted to providing links to these on-line databases (and
currently links to about 10,000). Other sites which provide links to these databases are Lycos Searchable Databases,
Search IQ's Subject Directories, and The BigHub.com. |
Conducting a Search
Much of the advice that you will come
across on how to search will initially appear to be too complex and technical (articles
about Boolean logic etc.) Following four simple rules will get you a long way towards what
you want (there is no need to follow all four rules all of the time, as following Rule
1 will often get you what you want straight away).
Rule 1: Be as specific as you can
Rule 2: Know which search
engines/directories are best suited to your question
Rule 3: Perform your search in several
search engines simultaneously
Rule 4: Log off and analyse your
search results before clicking on the links
1
|
Launch a
text editor or word processor and type in a number of words which relate to your
search. Doing this will: -
 |
Focus your
mind on exactly what your question is |
 |
Enable you to
copy and paste the words (correctly spelt) into several search engine inputs in quick
succession |
 |
Allow you to
construct several different versions of your search string in the text editor, and
remember which strings you have used before. |
Be as specific as you can.
For example, if you want to know about
advances in the treatment of Parkinson's disease then the keywords might be 'Parkinson's
disease', treatment, advances, drugs. Note the use of quote marks to make Parkinson's
disease into a phrase.
If you want to find pages that have all
the words you enter then use the + symbol e.g. +Parkinson's +disease +treatment
If you want to exclude specific words,
you can put a minus symbol in front of them. |
2
|
Decide
on the type of question you are asking and use the search engines/directories
which are best suited to your task. See 'Using the Appropriate Tool'
above. |
3
|
Perform
your search in several search engines simultaneously Open up three or four of your favourite search
engines in separate windows with your browser (<Ctrl N> is the quick way to open new
windows with Internet Explorer).
If you have used your search engines
recently this may even be done off-line, as the search pages will still be cached on your
hard drive. You can then enter the search string before pressing the Search button to go
on line.
Copy and paste the keywords into each
of the search engine input boxes and perform the search. |
4
|
Once the
search has been run on each of the search engines log off the Internet and analyse
the search results. The
only clues available to you prior to clicking on the hyperlink are the title of the web
page, the page summary (usually written by the page author) and the web page address (the
URL). Assessing the first two with respect to your search should be fairly self evident.
In addition you should become
accustomed to quickly assessing the URL to decide on the likely nature of the author. read
the section on URL's in About the Internet.
When you have decided on which links
are promising then click on them and log on again. If your search is not looking too
promising at this stage, now may be the time to look at Refining your
Search |
|