freefind
menu   Login

home features pricing search faq library site map policies contact
Build Index Help
Library > Reference > CC Help > Build Index  
This page controls the search engine spider (indexer). The spider is an "automatic browser" which when instructed will visit your site, find all the pages linked into your site, and index them.

Every time your site changes and you want your search engine index to reflect those changes you need to have your site re-spidered to update the index.

In general, when you use a link option link in the Control Center a "dialog" with controls will appear. After setting the control values you can save your changes by pressing finish or you can abandon them by pressing cancel.

 
Contents
Index now
Schedule re-indexing
Set starting points
Exclude pages
PDF indexing
Indexing speed
Relevance options
Define subsections
Password protected areas
Result link target
[top]

The sections below correspond to the available commands on the page.

Index now(top)

Use this link to tell the spider to re-index your site. Every time your site changes you will want to do this!

Schedule re-indexing(top)

If you regularly update your site, you can use this link to tell the spider to automatically run on a pre-determined schedule.

Set starting points(top)

If you want to include additional sites in your index, or if the spider cannot locate all of the pages in your site, you can give it additional starting points. The spider uses these additional starting points to locate pages to index just like it does with the main address of your account.

The addresses need to be entered one per line (wrapping by the browser can be ignored), and should include the starting "http://...".

Note that these addresses are not treated as individual pages, but as sites. The spider will follow the links on each page listed in order to locate even more pages to include in the index.

If you want to add just a single page to the index, you can first prevent the spider from following the links on that page by using an exclusion like:
   http://somesite.com/justthispage.html  index=yes follow=no
and then add it to the list of additional starting points:
   http://somesite.com/justthispage.html
See the next section for information on exclusions.

Exclude pages(top)

Use this to prevent the spider from indexing certain parts of your site and/or from following the links on specified pages. If you are looking for more information than this summary provides, see How to Exclude Pages from Search.

This dialog contains a simple list of file "exclusions", one per line (browser wrapping may be ignored). Each exclusion consists of a "URL mask" optionally followed by one or more exclusion modifiers.

The URL mask is simply a standard web address, but may contain the common wildcards "*" and "?" to make it match more than one web address. The "*" will match any number of any character and the "?" will match any single character. Non-wildcard characters are matched without regard to case (case-insensitive). URL masks which do not begin with "http://" are treated as if they begin with "*". Because of this it is recommended that you include the "http://" in your URL masks.

The URL mask may be followed by exclusion modifiers. There are two:
   index=no/yes
   follow=no/yes
The "index" modifier specifies whether pages matching the mask will be included in the index. The "follow" modifier specifies whether pages matching the mask will have their links followed in order to locate other pages to index. The default values are:
   index=no follow=no

When determining which exclusion to apply, entire list of exclusions is considered and the last matching exclusion is used. This allows convenient expression of "exclude everything but..." logic. For example, to prevent everything in your "http://mysite.com/cgi-bin/" directory from being index except pages generated by the CGI "content.cgi" you can use the following:
   http://mysite.com/cgi-bin/*
   http://mysite.com/cgi-bin/content.cgi* index=yes follow=yes

PDF indexing(top)

By default, FreeFind indexes HTML and text pages. You can also have your PDF format files indexed. If you want to have your PDF files indexed, choose this option. The setting takes effect next time your site is indexed.

Indexing speed(top)

This controls how fast your site will be indexed and the load placed on your server. The indexing speed controls how long the indexer will pause between page reads as it scans your site. The available indexing speeds are:

  • slow: 3 seconds
  • standard: about a second
  • fast: 1/3 second (subscribers only)
In addition, subscribers with professional accounts can select the "simultaneous access" option to have their site indexed in parallel. This can make re-indexing go 3 times faster.

Relevance options(top)

When the search engine returns the results of a user's query they are typically ordered by "relevance score" with the search engine placing first the document it believes to be most relevant.

The search engine automatically determines relevance score and, by default, it is configured to work well with a wide variety of websites.

You can also refine the relevance scoring for your website by using the relevance controls the the FreeFind control center.

Both page and text relevance scoring are detailed in the relevance scoring reference

Define subsections(top)

Use this to sub-divide your index into separately searchable sections. If you are looking for more information than this summary provides, see How to Use Sections.

This dialog contains a simple list of file "section specifications", one per line (browser wrapping may be ignored). Each section specification consists of a "URL mask" and a list of single-word section names each with an optional modifier. Here are a few quick examples:
   http://mysite.com/store/* products
   http://mysite.com/store/test/* products=exclude
   http://mysite.com/store/art/* products artsupplies
   http://mysite.com/articles/* howto

The URL mask is simply a standard web address, but may contain the common wildcards "*" and "?" to make it match more than one web address. The "*" will match any number of any character and the "?" will match any single character. Non-wildcard characters are matched without regard to case (case-insensitive). URL masks which do not begin with "http://" are treated as if they begin with "*". Because of this it is recommended that you include the "http://" in your URL masks.

The URL mask is followed by one or more single-word section names. Your visitors will never see these names, they are just used by the search engine to identify each section. Each section name may by followed by an equals sign ("=") and then one of the modifiers:
   include
   exclude
to control whether web addresses which match the URL mask are included or excluded from that section. The default is "include".

Note: The section name "web" is reserved. You cannot use it as your own section name. It is used by search panels to indicate a web search (not a site search) should be performed.

When determining which section specification to apply, entire list is considered and the last matching section specification is used. This allows convenient expression of "include everything but..." logic. For example, to include everything in your "http://mysite.com/store/" directory in a section except pages in the "/store/test/" subdirectory you can use the following:
   http://mysite.com/store/* products
   http://mysite.com/store/test/* products=exclude

After you have specified all of your sections your site will be reindexed before the new sections are active.

Now that you have an index with various sections you need to use an appropriate search panel to allow your visitors to use those sections. To do this just go to the html page and choose the panel with sections. Add it to your web site in the usual manner. To review instructions for this see the chapter Adding your Panel to your Site in the tutorial Page Search Setup.

You may want to change the labels of the sections as they appear in the drop down list. This is fine, just be sure to change the option text only, not the option value itself. To see an example of this, and more information on customizing any search panel to support sections, see the chapter Customizing Your Search Panel with Sections in How to Use Sections.

Password protected areas(top)

Use this option to set up your account to index password-protected areas of your site. HTTP Basic Authentication is supported. When you click on the link the dialog page that appears has extensive documentation, plus you can refer to How to Index Password Protected Pages.

Result link target(top)

Use this selection to set the target of the links in the search results that lead back to your site. Note: Most sites do not need to use this function, even those with frames. This does not target your search panel so that the search results page is shown in a particular frame or window (for that operation, go to the html page and use the set the frame target link). This function controls the link of the pages listed in the search results.

A simple example would be:
   http://mysite.com/* framename
This would cause the links of all mysite.com pages listed in the search results to have a link target of framename.

Another example is:
   http://mysite.com/archive/* aframe
   http://mysite.com/products/* pframe
Which would cause search result links starting with mysite.com/archive/... to have a target of aframe, and links starting with mysite.com/products/... to have a target of pframe.

For more information on framed sites read How to Setup your Framed Site.

 

login home features pricing search faq library sitemap policies contact
FreeFind and FreeFind.com are trademarks of FreeFind.com.
Copyright 1998 - 2017