freefind login search menu
Build Index Help

This page controls the search engine spider (indexer). The spider is an "automatic browser" which when instructed will visit your site, find all the pages linked into your site, and index them.

Contents Index now Schedule re-indexing Set starting points Exclude pages PDF indexing Indexing speed Relevance options Password protected areas Define subsections Result link target Every time your site changes and you want your search engine index to reflect those changes you need to have your site re-spidered to update the index.

In general, when you use a link option link in the Control Center a "dialog" with controls will appear. After setting the control values you can save your changes by pressing or you can abandon them by pressing .

Index now
Use this option to tell the spider to re-index your site immediately. Normally your site will be automatically re-indexed on a schedule we determine. However, you can also use this option to tell us to re-index your site as soon as possible.
Schedule re-indexing
Use this option to control the automatic re-indexing schedule for your site. You can leave it on automatic, set it to a specific pre-determined schedule or turn it off entirely.
Set starting points
If you want to include additional sites in your index, or if the spider cannot locate all of the pages in your site, you can give it additional starting points. The spider uses these additional starting points to locate pages to index just like it does with the main address of your account.

The addresses need to be entered one per line (wrapping by the browser can be ignored), and should include the starting "https://..." or "http://...".

Note that these addresses are not treated as individual pages, but as entire sites. The spider will follow the links on each page listed in order to locate even more pages to include in the index.

If you want to add just a single page to the index, you can first prevent the spider from following the links on that page by using page exclusions like:

		example.com/justthispage.html index=yes follow=no
	
and then add it to the list of additional starting points:
		example.com/justthispage.html
	
See the next section for information on exclusions.
Exclude pages
Use this option to prevent the spider from indexing certain parts of your site and/or from following the links on specified pages. If you are looking for more information than this summary provides, see How to Exclude Pages from Search.

This dialog contains a simple list of file "exclusions", one per line (browser wrapping may be ignored). Each exclusion consists of a "URL mask" optionally followed by one or more exclusion modifiers.

The URL mask is simply a standard web address, but may contain the common wildcards "*" and "?" to make it match more than one web address. The "*" will match any number of any characters and the "?" will match any single character. Non-wildcard characters are matched without regard to case (case-insensitive). URL masks which do not begin with "http://" are treated as if they begin with "*".

The URL mask may be followed by exclusion modifiers. There are two:

		index=no/yes
		follow=no/yes
	
The "index" modifier specifies whether pages matching the mask will be included in the index. The "follow" modifier specifies whether pages matching the mask will have their links followed in order to locate other pages to index. The default values are:
		index=no follow=no
	

Important: only the last matching exclusion is used when determining which exclusion to apply.

When determining which exclusion to apply, entire list of exclusions is considered and only the last matching exclusion is used. This allows convenient expression of "exclude everything but..." logic. For example, to prevent everything in your "https://example.com/cgi-bin/" directory from being index except pages generated by the CGI "content.cgi" you can use the following:

		example.com/cgi-bin/*
		example.com/cgi-bin/content.cgi* index=yes follow=yes
	
Extended indexing
By default, FreeFind indexes HTML, text, PDF and common office document types. If, instead, you just want HTML and text indexed, use this option to disable extended indexing. The setting takes effect next time your site is indexed.
Indexing speed
This controls how fast your site will be indexed and the load placed on your server. The indexing speed controls how long the indexer will pause between page reads as it scans your site. The available indexing speeds are:
  • slow: 3 seconds
  • standard: about a second
  • fast: 1/3 second (subscribers only)
In addition, subscribers with professional accounts can select the "simultaneous access" option to have their site indexed in parallel. This can make re-indexing go 3 times faster.
Relevance options
When the search engine returns the results of a user's query they are typically ordered by "relevance score" with the search engine placing first the document it believes to be most relevant.

The search engine automatically determines relevance score and, by default, it is configured to work well with a wide variety of websites.

You can also refine the relevance scoring for your website by using the relevance controls the the FreeFind control center.

Both page and text relevance scoring are detailed in the relevance scoring reference

Password protected areas
Use this option to set up your account to index password-protected areas of your site. Both HTTP Basic Authentication and custom form-based authentication is supported. When you click on each option the settings page that appears will have extensive documentation, plus you can refer to How to Index Password Protected Pages.
Define subsections
Use this to sub-divide your index into separately searchable sections. If you are looking for more information than this summary provides, see How to Use Sections.

This dialog contains a simple list of file "section specifications", one per line (browser wrapping may be ignored). Each section specification consists of a "URL mask" and a list of single-word section names each with an optional modifier. Here are a few quick examples:

		example.com/store/* products
		example.com/store/test/* products=exclude
		example.com/store/art/* products artsupplies
		example.com/articles/* howto
	

The URL mask is simply a standard web address, but may contain the common wildcards "*" and "?" to make it match more than one web address. The "*" will match any number of any characters and the "?" will match any single character. Non-wildcard characters are matched without regard to case (case-insensitive). URL masks which do not begin with "https://" or "http://" are treated as if they begin with "*".

The URL mask is followed by one or more single-word section names. Your visitors will never see these names, they are just used by the search engine to identify each section. Each section name may by followed by an equals sign ("=") and then one of the modifiers:

		include
		exclude
	
to control whether web addresses which match the URL mask are included or excluded from that section. The default is "include".

Note: The section name "web" is reserved. You cannot use it as your own section name.

Important: only the last matching line is used when determining which section specification to apply.

When determining which section specification to apply, only the last matching section specification is used. This allows convenient expression of "include everything but..." logic. For example, to include everything in your "http://example.com/store/" directory in a section except pages in the "/store/test/" subdirectory you can use the following:

		example.com/store/* products
		example.com/store/test/* products=exclude
	

After you have specified all of your sections your site will be reindexed before the new sections are active.

Now that you have an index with various sections you need to use an appropriate search panel to allow your visitors to use those sections. To do this just go to the html page and choose the panel with sections. Add it to your web site in the usual manner. (To review instructions for this see Adding your Panel to your Site.)

You will probably want to change the labels of the sections as they appear in the drop down list. This is fine, just be sure to change the option text only, not the option value itself. To see an example of this, and more information on customizing any search panel to support sections, see Customizing Your Search Panel with Sections.

Result link target
Use this option to set the target of the links in the search results page that lead back to your site. A simple example is:
		example.com/* framename
	
This would cause the links of all example.com pages listed in the search results to have a link target of framename.

Another example is:

		example.com/archive/* aframe
		example.com/products/* pframe
	
This would cause search result links starting with example.com/archive/... to have a target of aframe, and links starting with example.com/products/... to have a target of pframe.

For more information on framed sites read How to Setup your Framed Site.

Note: Most sites do not need to use this function, even those with frames.

Note: This does not target your search panel so that the search results page is shown in a particular frame or window. For that operation, go to the html page and use the set the frame target link.

FreeFind and FreeFind.com are trademarks of FreeFind.com.
Copyright 1998 - 2024