The preferred way of preventing parts of your site from being indexed
is to use the Control Center page exclusion mechanism. This is covered in
How to Exclude Pages from Search.
You should read that "how to" first.
The only reason you might need to use a
file is if you want to prevent someone else from using this search engine
to index your site.
This tutorial is not a web/html primer and assumes that you already know
how the process of "web surfing" is accomplished (i.e. a browser requests a page from a server which
then returns the page to be viewed), what an HTML "form" is and how it works,
and what a link "target" is.
If you are not familiar with these concepts please read a basic web/html primer.
robots.txt file is easy,
but does require access to your server's root location.
For instance, if your site is located at:
you will need to be able to create a file located here:
If you cannot access your server's root location you will not be able to use a
file to exclude pages from your index.
robots.txt is a TEXT file (not HTML!)
which has a section for each robot to be controlled.
Each section has a
user-agent line which names the robot to be controlled and has a list of "disallows" and "allows".
Each disallow will prevent any address that starts with the disallowed string
from being accessed.
Similarly, each allow will permit any address that starts with the allowed string
from being accessed.
The (dis)allows are scanned in order, with the last match encountered determining
whether an address is allowed to be used or not.
If there are no matches at all then the address will be used.
Here's an example:
In this example the following addresses would be
ignored by the spider:
and the following ones would be allowed:
It is also possible to use an "allow" in addition to disallows.
file prevents the spider from
address from being accessed except
Using allows can often simplify your
Here's another example which shows a
two sections in it. One for "all" robots, and one for the FreeFind spider:
In this example all robots except the FreeFind spider will be prevented from
accessing files in the
FreeFind will be able to access all files (a disallow with nothing after it
means "allow everything").
This section has a few handy examples.
To prevent FreeFind from indexing your site at all:
To prevent FreeFind from indexing common Front Page image map junk:
To prevent FreeFind from indexing a test directory and a private file:
To allow let FreeFind index everything but prevent other robots from accessing certain files: