Advanced Toolbox for Web Developers

Main » Site Catalog » Misc recourses

Setting up Sphinx
http://community.invisionpower.com/resources/documentation/index.html/_/tutorials/large-communities/setting-up-sphinx-r181 05.11.2012, 21:41
Warning: Some steps of this document are intended for advanced users, require root access to your web-server (which in turn requires that you have a web-server with root access, usually reserved for dedicated hosting or vps hosting), and should not be attempted if you are not sure of what you are doing.

Introduction
IP.Board 3.x provides full out-of-the-box support to utilize Sphinx for fulltext searching of content on your site.  That said, it is still your responsibility to install and configure Sphinx, so this article will help you do just that so that you can use Sphinx for searching content within IP.Board 3.x.

Please be advised that applications must define a sphinx template in order for the content to be searchable.  If you install a third party application that does not properly define a sphinx template file, it will not be searchable through Sphinx.

Installing Sphinx
The first thing you must do is install Sphinx itself.  Sphinx is a third party search engine available at http://sphinxsearch.com .  The documentation on their site explains how to install sphinx, but below you will find the general commands you will need to run.

Login to your webserver as root.  If you do not have root access to your webserver, contact your host for assistance.

Change directory to a temp directory and download sphinx.  Untar the package afterwards, and move into the untarred Sphinx directory.

NOTE: There may be a newer version available, so the below is only an example. Please refer to sphinxsearch.com for the latest stable version.
cd /tmp wget http://sphinxsearch.com/files/sphinx-0.9.9.1.tar.gz tar xzvf sphinx-0.9.9.1.tar.gz cd sphinx-0.9.9.1

Next you need to configure, make and make install the package

./configure make make install

If you get an error at any of these steps, stop and correct it.  For instance, if Sphinx cannot find your mysql binaries, you can tell it where they are by passing "--with-mysql (path)" to the ./configure command.

Once this is done, Sphinx is installed and ready to be used (though there is still more work to do).

You will need to copy the api/sphinxapi.php file provided in the Sphinx download to your forum root directory

cp api/sphinxapi.php /path/to/forums/here

Next, you should create the directories that Sphinx will store it's log files and index files in.  The suggested directory is /var/sphinx, however you can create the directory anywhere you wish.  Just remember where you put it.

mkdir -p /var/sphinx/log

Configuring IPB
Now, login to your IPB admin control panel.  Visit System -> System Settings -> Search Set-Up.  Change "Type of search" to "Sphinx" in the dropdown, and configure the Sphinx settings appropriately.  In most cases, you do not need to change any of the sphinx settings, however if you created a directory other than /var/sphinx, or if you are installing Sphinx on your MySQL server in a multi-server setup, you will need to adjust these appropriately.  Save the settings.

Visit System -> Manage Applications & Modules next, and click on Build Sphinx Config.  You will be presented with a downloadable copy of sphinx.conf.  Download this file, and then upload it to your server (the exact location is unimportant, but remember where you put it).

Creating the index and starting the search daemon
Back in shell, you need to index your searchable content.  This is an expensive task, however even with very large databases (4+ million posts or more) this does not take a very long time.

Run the following command, replacing the path to the sphinx.conf file appropriately

/usr/local/bin/indexer --config /path/to/sphinx.conf --all

Once this is done, you need to start the search daemon.

/usr/local/bin/searchd --config /path/to/sphinx.conf

And once the search daemon is running, you should be able to use the search feature on your site, now using sphinx for it's backend searching.  Go give your search function a quick test to make sure everything is working before proceeding further.

Final "tweaks"
There are two more steps you need to do.

First, you need to create two cron jobs to rebuild the indexes at intervals.  One cron job will rebuild the "delta" index (only including new content) every 15 minutes.  This task only grabs new content, so it is not overly resource-heavy.  The second task will rebuild the entire index once a day (to ensure edited posts and so forth are re-indexed properly), and since it has to rebuild the entire index should be scheduled for a time period that your server is least busy (e.g. 4 AM).

You can get the crontab files via ACP->Manage Applications and Modules and clicking the "Sphinx" dropdown, and selecting "Build Cronjobs"

You will need to create a cronjob by typeing the command

crontab -e

And then inserting the code you got above into it.


Finally, in case you restart your server, you want to make sure that Sphinx is started back up when the server starts.  The method of doing this will vary from system to system, so contact your system administrator if you are unsure.  We generally use on CentOS the following:

nano /etc/init.d/rc.local

and add to the file

rm -f /var/sphinx/*.spl /usr/local/bin/searchd --config /path/to/sphinx.conf

This will remove any lingering lock files that may have been left and restart Sphinx.  Adjust the paths as appropriate.

International Character Sets

You may find if you use non-latin character sets, have customized the character set used for MySQL, or made other similar changes related to character sets that Sphinx searching is not working correctly.  In these cases, it may be necessary to modify your sphinx.conf configuration file to account for this.  A common example is if you use the SQL character set option in conf_global.php for connecting to MySQL, you need to tell Sphinx to do the same thing.

An example of some changes required to support Greek UTF-8 with Sphinx can be found in this user-submitted topic.  More information on setting up character set conversion tables is available in the Sphinx documentation.

Conclusion
Sphinx is an excellent search engine and can reduce resource usage on your servers when setup and in use.  You will need some commandline/Linux technical knowledge to do so, but once it's setup you shouldn't have to make many changes to it (only when you install new applications and want them to be searchable).  We hope this article provides the information necessary to setup and use Sphinx with IP.Board 3.0. 

Comments

  • Shouldn't put the file in /tmp as it gives an error:
    -bash: ./configure: /bin/sh: bad interpreter: Permission denied
  • Also a good idea to add the following to each line in the cron to avoid having emails sent every 15 minutes and when it's rebuilt.
    >/dev/null 2>&1
Transitions: 462 | Added by: cMerlin | Rating: 0.0/0 | Tags: Sphinx
Total comments: 0
Name *:
Email *:
Code *:
sendme