Showing entries 1 to 2
Displaying posts with tag: Sphinx insights (reset)
Debuging Sphinx index with indextool

Sometime you need to debug your Sphinx indexes to know what’s inside it, is it okay, is there document you trying to find? In this case indextool utility might be very handy as it gathers information directly from index files even searchd is not started. Here few examples of indextool usage:

Checking index consistency

One of the most important functions of indextool is checking index consistency. You will need to have sphinx config file and index files.
/path/to/indextool -c sphinx.conf --check my_sphinx_index

This will perform checking of my_sphinx_index for consistency between document list, hit list, positions and other internal sphinx index structures. Please note that indextool is only checking disk indexes (starting from 2.0.2 it could also check on-disk part of Real-Time indexes, but not a memory part). Usual output for healthy index looks likes …

[Read more]
Top 100 and top 500 stopwords for Sphinx Search

Back to year 2006 when I was working for my first sphinxsearch project I was playing with stopwords files. Stopwords is basically a small set of highly frequent words you often don’t want to search for (like “I”, “Am”, “The”, etc). For most sphinx instances they only wasting index space and slower your search queries by finding all occurrences of these non-important words.

Say if you are searching for “when is jane’s birthday” you are actually looking to find documents with “jane’s birthday”, and you don’t really care about lot’s of documents (blog posts, news articles, etc) with only “when” and “is” inside.

Remove those high frequency words from search index is usually smart move and ages ago I’ve created two stopword file samples which I’m using by now.

[Read more]
Showing entries 1 to 2