Cached copies

Starting from the version 3.2.2 mnoGoSearch is able to store compressed copies of the indexed documents, so called cached copies. Cached copies are stored in the same SQL database.

search.cgi uses cached copies for two purposes:

  1. To display smart excerpts from every found document with the search query words in their context.

  2. To display the entire original copy of the document, with the search words highlighted.

    Note: A cached copy is opened in the browser when the user clicks on the Display cached copy link near every document in search results.

    Watching a cached copy can be especially useful when the original site is temporarily down or the document does not exist any longer.

Cached copies are displayed by with help of search.cgi executed with a special HTTP query string parameter. search.cgi fetches a cached copy of the document from the SQL database, decompresses it, and the document is displayed in your web browser, with search keywords highlighted.

To enable cached copies support, compile mnoGoSearch with zlib support:

     ./configure --with-zlib <other arguments>

Configuring cached copies

Collecting cached copies is enabled in the default version of indexer.conf using this line:

      Section CachedCopy 0 64000

The number 64000 is the maximum allowed cached copy size. When crawling, indexer stores a cached copy only if its compressed size is smaller than the maximum allowed size. You can change this number according to your needs and your SQL database capabilities.

Note: Storing too large cached copies can affect search performance negatively.

You can disable collecting cached copies: open indexer.conf in your favorite text editor and delete the Section CachedCopy line. Disabling cached copies will save disk space, however search results presentation will be not as good as with cached copies enabled.

Using cached copies at search time

Displaying cached copies is enabled in the default search result template search.htm-dist. To check if your template enables displaying cached copies, open the template in a text editor and make sure that you have this HTML code in the section <!--res-->:

<A HREF="$(stored_href)">Display cached copy</A>

When using the default search template, search.cgi refers to itself recursively, that is it when you follow the Display Cached Copy link in your browser, you'll open search.cgi again (just with special query string parameters which tell to display a cached copy rather than search results).

After cached copies have been configured, it works in the following order during search time:

  1. For each document a link to its cached copy is displayed;

  2. When the user clicks the link, search.cgi is executed. It sends a query to the SQL database and fetches the cached copy content.

  3. search.cgi decompresses the requested cached copy and sends it to the web browser, highlighting the search keywords using the highlighting method given in the HlBeg and HlEnd commands;

Moving cached copies to another machine

You can optionally specify an alternative URL for the Display Cached Copy links, to have cached copies reside under another location of the same server, or even on another physical server. For example:

<A HREF="http://site2/cgi-bin/search.cgi?$(stored_href)">Display cached copy</A>
Moving cached copies to another server can be useful to distribute CPU load between machines.

Note: mnoGoSearch must be installed on the machine site2.

Using the original document as a cached copy source

Starting from the version 3.3.8, mnoGoSearch understands the UseLocalCachedCopy command in search.htm to force downloading documents from their original locations when generating smart excerpts for search results as well as when generating the "Cached Copy" documents. This command can be useful when you index the documents residing on your local file system and helps to avoid storing of cached copies in the database and thus makes the database smaller.