Contents
Home
Revision
Advanced Syntax
Using Fields
Invisible Web
Task List
EBM Top 10


Search Engines

AltaVista
Ask Jeeves!
Excite
Fast
Google
HotBot
Infoseek
Lycos
Northern Light
Oingo
WebCrawler

Metasearch
Dataware
Dogpile
Inference Find
Ixquick
MetaCrawler
Simpli

Directories
Argus
Clearinghouse

Open Directory
Yahoo
About
Looksmart
Internet Public
Library

Infomine
WWW Virtual
Library


Invisible Web
Lycos Databases
Search IQ
The Big Hub
The Invisible Web

Advanced Search Syntax

Using advanced search syntax requires you to know a considerable amount about the particular search engine you are using - you will definitely need to read the help file. If you are willing to tackle this level of knowledge/expertise then you are seriously into searching! I would recommend using the AltaVista Advanced search initially as the AltaVista tutorial is very helpful. AltaVista have also recently introduced a Power Search, which attempts to use a form to substitute for some of the syntax (i.e. you select options rather than typing in syntax words). This is similar to the Hotbot advanced search concept.

The main reason for using advanced search syntax is in the refining of a search on a full text indexed search engine, and the typical reason for needing to do this will be that you have had too many hits, with no useful hits in the first 10-20.

The topics covered in the next few sections will be: -

*Boolean Logic - Introduction
*Boolean Operators
*Parentheses
*Using Fields is covered on the next page

If you have had too few hits then usually advanced syntax is not the answer. You must: -
* Check your spelling (remember American variants)
* Use truncation for stemming (bringing in plurals and different verb endings), and wildcards if unsure of spelling
* Combine search words with OR
* Remove any minus signs or AND's or AND NOT's
* Decrease the number of keywords
* Consider whether you need an on-line database, and if so how to find it



Boolean Logic - Introduction

Boolean expressions are produced by combining Boolean operators with the correct syntax. The common operators are listed in the table below.

Operator

Description
AND
"search engine" AND tutorial
Search words on both sides of this operator must be present in the document to score a hit
OR
"search engine" (guide or tutorial)
Search words on either side of this operator are sufficient to score a hit
AND NOT
"search engine" (guide or tutorial) AND NOT beginner
Search words after this operator make the search engine exclude the web page from the hits

NEAR
"search engine" NEAR AltaVista  (guide or tutorial)

Search words have to be within a certain number of words from one another in order to score a hit.

NB - BEFORE/AFTER are similar to NEAR but specify the order of the words as well

PARENTHESES (brackets)
"search engine" AND AltaVista AND (guide or tutorial)

Parentheses () are often neglected in discussions of Boolean logic, but they are an integral part of the logic, as they tell the search engine in what order to process the operators.

Probably one of the most frustrating features of using Boolean logic with search engines, is that the search engines themselves (in an effort to be user-friendly) apply the Boolean syntax rather loosely. To a certain extent this makes the whole business a bit pointless, and I would recommend only using Boolean logic if really necessary, as it usually offers little more than just using search engine arithmetic. The following two sections discuss Boolean Operators and Parentheses in some detail.



Boolean Operators

AND

and.gif (2258 bytes)

Search words on both sides of this operator must be present in the document to score a hit.

Example: ovary AND cancer will return web pages in which both the word ovary and the word cancer are present. Theoretically, the search should not return web pages in which only one of the words is present.

The result is the same as if you had used the + sign as previously described in the section on search engine arithmetic

Example: +ovary +cancer is equivalent to ovary AND cancer


AND will narrow or focus your search.  I mentioned above that the search should not return web pages in which only one of the words is present. In practice most search engines will actually also return web pages in which either word is present (theoretically the result of using the OR operator) but they will rank such pages much lower in the results list.

An interesting variation is the search +ovary cancer which should return all of the web pages containing ovary, and will rank higher any pages which also contain the word cancer.

hr.gif (186 bytes)

OR

or.gif (2454 bytes)

Search words on either side of this operator are sufficient to score a hit.

Example: ovary OR cancer will return web pages in which either the word ovary or the word cancer are present. It will also return web pages in which both the word ovary and the word cancer are present.

Because OR is the default operator for most search engines the result is the same as if you entered the words without any operator.

Example: ovary cancer is equivalent to ovary OR cancer


OR will broaden your search. One of the most important uses of OR is to indicate to the search engine the use of a synonym

Example: (ovary) AND (cancer OR carcinoma OR neoplasm) will return pages containing the word ovary together with any of the three words in brackets. You may wonder why I have put the word ovary in brackets, and this will be discussed later in the section on parentheses


hr.gif (186 bytes)

AND NOT

andnot.gif (2449 bytes)

Search words after this operator make the search engine exclude the web page from the hits.

Example: ovary AND NOT cancer will return web pages in which the word ovary is present, but the word cancer is not.

The result is the same as if you had used the + and - signs as previously described in the section on search engine arithmetic.

Example: +ovary -cancer is equivalent to ovary AND NOT cancer

Use this operator with caution as it can often exclude a number of otherwise useful hits. For example, if the author has written 'This article is about the human ovary, but does not include any discussion about cancer of the ovary' then this might be exactly what you want. However, the search ovary AND NOT cancer will exclude this web page, because it contains the word cancer.


hr.gif (186 bytes)

 



Parentheses

Parentheses () are often neglected in discussions of Boolean logic, but they are an integral part of the logic, as they tell the search engine in what order to process the operators.

They also significantly affect the order in which the hits are displayed. Consider the three searches below.

ovary AND cancer OR carcinoma

ovary AND (cancer
OR carcinoma)

(ovary) AND (cancer OR carcinoma)


The first string is somewhat ambiguous. The AND operator technically takes precedence over the OR operator (in much the same way that the multiplication sign takes precedence over the addition sign in simple arithmetic so that 2x3+3=9 and not 12).

The result would be that the search engine would find pages containing ovary and cancer together, or containing carcinoma (with or without ovary).

or.gif (2454 bytes)

The second string attempts to correct this problem by instructing the search engine to process the cancer OR carcinoma OR neoplasm as one item, and then to combine the result with ovary in an AND relationship. However, one unforeseen consequence of this is that because the search engine processes the contents of the parentheses first, it uses the results of the string within the brackets as the most important keyword term when it comes to ranking results. The ranking order is equivalent to a search of the type cancer AND ovary as opposed to ovary AND cancer.

and.gif (2258 bytes)

What you might find is that a web page which mentions cancer many times, and ovary infrequently, would be favoured over one which mentions ovary frequently and cancer less often. This might be what you want, but you should remember that the normal order of ranking of keywords as the most important being on the left is overridden when parentheses are present. The search engine is obliged to follow the rules of nesting when there are brackets - the deepest nest is interpreted first.

The third string, (ovary) AND (cancer OR carcinoma) compels the search engine to interpret ovary first (as the first of two equally nested brackets) and to combine it in an AND relationship with the results of the second set of brackets. The ranking order of results would be equivalent to the ovary AND cancer search string.

ovandca.gif (2748 bytes)

The general order of interpretation of parentheses, therefore, follows basic mathematical rules. See if you can follow the logic of the example below.

FOURTH to be (THIRD to be (FIRST to be evaluated) (SECOND to be evaluated) evaluated) evaluated

Caution must be taken not to promote the most deeply nested terms into being the most important terms when the results are ranked.

Having read and understood all of the points above, it will disappoint you to know that the search engines often apply Boolean logic rather loosely, effectively trying to make guesses at the logic you really want, as opposed to what is in the search string. Whilst this is helpful to inexperienced searchers, it is rather irritating if you have carefully constructed a search, only to find that the search engine treats it as if the parentheses weren't there!!

 

back

next


Raouf Allim
22 Benjamin Road
High Wycombe
Bucks. HP13 6SR
raouf@allim.tc
2nd August 2000