It is no secret that Spil Games is a heavy user of Sphinx Search. We use it in many ways including game-search, profile-search and since a few months ago to even build our category and subcategory listings. In all cases we do not use it as an extension of MySQL but rather as a standalone […]
For some of you who situated near New York City I am happy to announce that you could attend two events related to leading Full-Text search engines in open source – Sphinx Search.
First meeting organized by NYPHP meetup on Tuesday, September 25th at IBM, 590 Madison Avenue, New York. I’ll be speaking about search services in cloud environment and distributed search tips and tricks. Event is free, please RSVP.
One week later on October 1st, I’ll be doing tutorial about MySQL and Sphinx “Full-text based services with Sphinx and MySQL” …[Read more]
Sometime you need to debug your Sphinx indexes to know what’s inside it, is it okay, is there document you trying to find? In this case indextool utility might be very handy as it gathers information directly from index files even searchd is not started. Here few examples of indextool usage:
Checking index consistency
One of the most important functions of indextool is
checking index consistency. You will need to have sphinx config
file and index files.
/path/to/indextool -c sphinx.conf --check
This will perform checking of my_sphinx_index for consistency between document list, hit list, positions and other internal sphinx index structures. Please note that indextool is only checking disk indexes (starting from 2.0.2 it could also check on-disk part of Real-Time indexes, but not a memory part). Usual output for healthy index looks likes …
It’s very handy to have FT search out of the box, but there are several drawbacks attached. Problem is that MyISAM Full-text search is not designed to handle big amounts of text data. If you plan to index more than 1M documents you will probably need to take a look on the external search system like Lucene or Sphinx. For the usual LAMP-based service I personally would prefer to use Sphinx as it provides simple transition from MySQL FT and easy to integrate into any application (Sphinx could be queried via native APIs or via MySQL protocol).
Say we have table called <my_table> with `title` and `content` text fields. In MySQL you have to fire query like this:
SELECT * FROM <my_table> WHERE MATCH(`title`,`content`) AGAINST ('I love Sphinx');
Let’s see how could we do the same …[Read more]
Back to year 2006 when I was working for my first sphinxsearch project I was playing with stopwords files. Stopwords is basically a small set of highly frequent words you often don’t want to search for (like “I”, “Am”, “The”, etc). For most sphinx instances they only wasting index space and slower your search queries by finding all occurrences of these non-important words.
Say if you are searching for “when is jane’s birthday” you are actually looking to find documents with “jane’s birthday”, and you don’t really care about lot’s of documents (blog posts, news articles, etc) with only “when” and “is” inside.
Remove those high frequency words from search index is usually smart move and ages ago I’ve created two stopword file samples which I’m using by now.