freefind
menu   Login

home features pricing search faq library site map policies contact
Too Many Pages Indexed
Library > Reference > FAQ > Too Many  
If the search engine located more pages than you expected look here for the most common reasons why.



 


Problem: You are indexing a calendar
Solution:
Time goes on forever, so most automatic calendar programs will generate an infinite number of pages! The solution is to prevent the spider from indexing your calendar, or to limit the pages it does index. This can be done using the standard techniques for preventing parts of your site from being indexed. For more information on this, read How to Exclude Pages from Search.
[top]

Problem: You are indexing a forum
Solution:
Forums can generate many more pages that you might expect. This is because most forum software not only displays user messages, but has a rich set of supporting features, such as "reply to", "new thread", "show user", "edit user", "admin", "register", etc, etc. The net effect of this is that most of the pages indexed in a forum are junk pages - pages which you really don't want to show up in search results.

The solution here is to prevent the spider from indexing your forum, or to limit the pages it does index to only the messages. For tips on how to do this, including specific instructions for some popular forums, read Searching Forums.
[top]

Problem: You are indexing other dynamically-generated pages
Solution:
FreeFind can index dynamically-generated pages without problem. Sometimes it is too good, though, and finds more of these pages that you expected. This is because dynamic-content generators often generate pages for support functions, like program administration, etc. The net effect of this is that some (possibly a lot) of the pages indexed are "junk" pages - pages which you really don't want to show up in search results.

The solution here is to limit the pages the spider indexes, excluding the junk pages that you don't want included. This can be done using the standard techniques for preventing parts of your site from being indexed. For more information on this, read How to Exclude Pages from Search.
[top]

Problem: You are indexing your directory listings in addition to your pages
Solution:
When given a web address that refers to a directory (and not a specific file), most web servers will first look in that directory to find a "welcome" (or "default") file to display (typically index.html). If a welcome file cannot be located some servers will then automatically generate a page which lists the files in that directory - a directory listing.

In general, directory listings are considered a possible security problem and are considered to be a bad idea.

If your site has a link to a directory listing, the FreeFind spider will locate it and start indexing all the web directories on your server. Since most directory listings have links to sort each list a few different ways, this generally results in lots and lots of "junk" pages being indexed and potentially included in search results.

To determine if you are indexing directory listings, you can both look at the site map and try some searches. Most directory listing pages have a title like directory of ... or listing of ... or index of .... By searching for the first two words of these titles you can usual locate any directory listings you may have.

If you are indexing directory listings, the solution is to track down the original link(s) to the directory listing in your website and fix those so they refer to pages. You will probably want to do this regardless of the search engine to improve the security of your site.
[top]

Problem: You are indexing both "www.mysite.com" and "mysite.com"
Solution:
Some folks have "www" links and "non-www" links throughout their site as if they are the same. Then, to get their site completely indexed they have to add the other website address to the spider's list of additional starting points. Sometimes this works, but often it results in spidering twice as many pages as expected.

If this is happening to you, the fix is to standardize on one or the other; either use "www" in all your site's links or don't. Do not mix and match. As a bonus, fixing this problem may improve how well your site is indexed by the big web search engines.
[top]

Problem: You are indexing a "safeshopper" site.
Solution:
SafeShopper sites are unusual in that every link on their sites is unique! This results in a dramatically high page count unless you set up your FreeFind account a special way. For more information see the FAQ item: How do I index my Safeshopper site?.
[top]

Problem: You are indexing a "homestead" site.
Solution:
Homestead automatically does a number of bizarre things to your site. For more information see the FAQ item: How do I index my Homestead site?.
[top]

Problem: You have specified additional spider starting points
Solution:
This might be a problem for two reasons:
You've forgotten about them.
If you set up your search engine some time ago, you may have simply forgotten which sites you have instructed it to index.

You intend them to be single pages, not sites.
You may not realize it, but each starting point address refers to a site, not a single page. Just like your primary account address, the spider will index the entire site at each starting point. If you are expecting it to just index the starting point page, the spider will probably end up indexing a lot more pages than you anticipate. It is possible to setup an "exclusion" to make the spider treat a starting point like a page instead of a site. For information on this read How to Exclude Pages from Search.


[top]

Problem: You have bad links and your server is not returning the correct error code
Solution:
Although most servers accurately detect requests for non-existent pages, some servers either:

  • Return an error page but don't set the error code, or
  • Return a page from your site and don't set the error code

If your server is doing the former you may end up with a few extra pages corresponding to the bad links in your site. Usually searching for:
   page not found
will locate these types of pages. You can also try searching for:
   404
and see if any pages are listed.

If your server is doing the latter and you have a certain type of error in one or more of your links, you can end up with an infinite number of pages being indexed. The type of bad link which can do this is one that ends with a slash but actually refers to a page, like:
   "oops.html/"

In both cases fixing the links then reindexing your site will fix the problem.
[top]

Problem: You really have more pages that you thought!
Solution:
This is not a problem. Now you know how big your site really is!
[top]

login home features pricing search faq library sitemap policies contact
FreeFind and FreeFind.com are trademarks of FreeFind.com.
Copyright 1998 - 2016