Searching with Sphinx

How to use sphinx to do full text searching.

install sphinx

There is no ubuntu package, install manually:

sudo apt-get install libmysqlclient15-dev
wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz
tar zxvf sphinx-0.9.8-rc2.tar.gz
cd sphinx-0.9.8-rc2
./configure
make
sudo make install

configure rails application

codetitle. install thinking-sphinx plugin

git clone git://github.com/freelancing-god/thinking-sphinx.git vendor/plugins/thinking-sphinx
cd vendor/plugins/thinking-sphinx
git checkout v0.9.5

To get the rake tasks to work, I added a line to the config/database.yml to say

  username: root

Try running rake ts:index. This should not do anything, but if you have everything set up correctly it won’t show an error.

specify how to index models

Now we need to specify specifically what data we want to include for the sphinx index. Here is any example entry in a model called “Page”:

codetitle. app/models/page.rb

define_index do
  indexes title
  indexes discussion.posts.body
  indexes tags.name
  indexes user_participations.user_id
  indexes group_participations.group_id
  has created_at
  has updated_at
end  

running sphinx

rake ts:index
rake ts:start
rake ts:stop
take ts:restart

using sphinx in your code

Here are some examples of using sphinx:

User.search :conditions => {:name => "Pat"}            # only search on the name field
User.search "Pat"                                      # looks at all fields
User.search "Pat", :page => (params[:page] || 1)       # any field for "Pat", limited to a particular page
User.search :conditions => "Pat",                      # any field for "Pat", load comments as well, sort by created_at.
    :include => :comments,
    :order => "created_at DESC"

You can also run a search that returns just a set of ActiveRecord id numbers:

Page.search_for_ids :conditions => { :title=>"magna", :group_id => 3 }
Page.search_for_ids :conditions => { :title=>"potenti", :group_id => 1 }, :order => "created_at DESC"

See ts.freelancing-gods.com for more information.

changing the defaults

These are the default configuration options for thinking-sphinx:

config file::      config/#{environment}.sphinx.conf
searchd log file:: log/searchd.log
query log file::   log/searchd.query.log
qid file::         log/searchd.#{environment}.pid
searchd files::    db/sphinx/#{environment}/
address::          0.0.0.0 (all)
port::             3312
allow star::       false
mem limit::        64M
max matches::      1000
morphology::       stem_en
charset type::     utf-8
charset table::    nil

If you want to change these settings, create a YAML file at
config/sphinx.yml with settings for each environment, in a similar
fashion to database.yml – using the following keys: config_file,
searchd_log_file, query_log_file, pid_file, searchd_file_path, port,
allow_star, mem_limit, max_matches, morphology, charset_type,
charset_table. I think you’ve got the idea.

troubleshooting

make sure you have sequential id numbers!

Sphinx does its indexing by going through all your records in chunks. To do this, it takes the minimum id number and the maximum id number and assumes that the records are distributed between them. If you have random id numbers, sphinx might have to do thousands of queries to index two records. It can take minutes.

In Rails 2.0, the fixtures will be created with random id numbers unless you specify an id manually. You must do this!

make sure no other sphinx processes are running

If you have a searchd running from another rails project, rake ts:start will not start another one. When you try to search, you will get no results back.

   

Hi! I have setup crabgrass (git master branxh as of 24/06/2011) and search is not working for unicode text (it works fine for english search terms).

Any ideas/suggestions?

PS: I know that the above description isn’t helpful. :(