Introducing the Fulltext Search

06/21/2005

When starting the Movable Type Weblog, I disabled the search function. There just weren't that many entries and a search dialog was not necessary.

Today, searching the Movable Type Weblog would be a good improvement. So I tried with the default Movable Type Search Engine. However, each search resulted in a CPU-intensive action. I tried to read the Perl code and saw that each search was done by selecting all entries from the database, then looping over all entries and finding the matching entries with string operations.

This is not the best way to do a fulltext search. Especially with large weblogs, this will probably lead to bad performance. The architecture does not scale very well.

Modern databases, such as the MySQL Server, have internal fulltext capabilities. So why not combine the two worlds, and make Movable Type search with the help of MySQL's fulltext search?

Stop explaining, show me!

If you want to try fulltext search now, use the search box in the sidebar.

Why is MySQL better in fulltext?

As described above, if Movable Type does a fulltext search, it has to retrieve all rows from the database. Then it loops over all rows, trying to match the search pattern with each and every row. With small weblogs this is ok. However, if the weblog grows this will be a very time-consuming operation.

If MySQL is told to support fulltext search, it will build special indexes. When inserting a row, it looks at the text, extracts all words, and puts these words and a reference back to the row into a special index. So, if you looked at that index, you might see...

  • The word "category" appears in entry A once, in entry B tree times, and in entry C twice.
  • The word "MTInclude" appears in entry A four times, and in entry D twice.

This kind of structure allows very fast access to the matching rows. When searching, the database engine does not need to look at each entry for finding a certain keyword. It simply has to look at the index, and then immediately knows where the keyword exists.

Faster lookup is one thing. Additionally MySQL will also do a ranking when searching. If you are searching for a several keywords in one search operation, for example "plugin perl MTInclude", MySQL will find all entries containing at least one of the keywords. However, it will also rank those entries higher that contain several keywords, or one keyword several times. As the result list can be ordered by rank, the more important entries can be listed first.

Problems

After having decided to use MySQL fulltext search in Movable Type, I tried to develop that function. However, some problems arose.

One of the problems is the fact that some Perl code had to be developed. Ok, I am a developer using the full range of languages supported by Microsoft, and I use MySQL, MS SQL Server and Oracle on a daily basis. However, Perl was unknown to me.

I jumped into the cold water and tried. Perl is a very specific language, not that easy to learn. Probably any Perl guru will smile, if he sees my Perl code. I do know that I have to improve on that, before thinking about releasing the code to the public. However, it does work.

Moreover, there were the usual little problems that arise, if such a development is done. For example, fulltext search is only supported with the MyISAM Engine. However, it is reasonable to have Movable Type store its data with the InnoDB Engine. These problems were "standard problems" - I do know how to handle them.

Solution

The solution consists of some SQL, some Perl, a CGI, and some HTML templates.

If you want to try fulltext search now, use the search box in the sidebar.

As this is a new feature, maybe it does not work perfectly all the time. If you see any problems, please let me know.

If this is something new to the Movable Type community and you want me to publish it, tell me. I have to polish it first, but anybody with a MySQL database could install it.

Update

Meanwhile I created the MTLookup website. It is a website for finding tutorials and information about Movable Type by cross-website searching. If you use the search box in the sidebar, MTLookup will be used.

If you are interested in this subject, please also have a look at the MTLookup archive. You reach it by clicking on MTLookup in the sidebar. This archive holds all MTLookup related articles.

Want to read on?

If you are interested in this subject, please select the category »MTLookup« from the sidebar. It will show all related articles in chronological order.

mgs | 06/21/2005

Feedback is welcome!

What do you think about this entry? Was it interesting or boring? I would like to hear your comments. If the text was helpful, please consider setting a link to http://www.movable-type-weblog.com/.

No spam please!

For protecting this weblog I have installed the MT-Approval Plugin. You have to view a new comment in preview mode, before it is saved on the server. Moreover, I will view your comment manually, before it is published. You can find more information on the subject in the entry Weblog Spamming Basics.

With an active TypeKey session, your comment will be published immediately.

Post a new comment

TypeKey has temporarily been disabled at this location. Please create your comment without using TypeKey or log in from the preview dialog.




Remember Me?


Comment

Max Khokhlov | June 21, 2005 11:15 PM

As you have mentioned, searching through a small personal blog with a small amount of entries may be done via MovableType search engine. However, more and more people get to know MovableType and like it more and more. Therefore there's a great number of regular non-blog sites based on MovableType as their CMS. Such sites may become pretty big, so I'm sure there's plenty of people who would like to implement a faster, more accurate and convenient search engine on their web-sites. I'm one of them. Thanks for the info. Looking forward to get new details about the technique, as well as files, instruction and examples. Good luck!

Comment

Christian Watson | June 22, 2005 07:40 AM

I would love you to publish this! We use MT to power our intranet at work and have several thousand pages in 50+ blogs. The default search is "okay" but something faster would be very nice.

Comment

Mark Carey | July 6, 2005 01:54 PM

Sounds very interesting. I have some very large MT blogs and would be very interested in a more efficient search method. Please publish! ;)

Comment

inaki | October 29, 2005 11:49 PM

Hi, I just found this entry, and this looks really really interesting, we find that MT's search engine is kind of slow and I would love to do full-text search, much much better.

Any chance to have a look at your scripts? That would be a great help!

Comment

Michael G. Schneider | October 30, 2005 06:40 AM

Inaki, the fulltext search - as it is described in this entry - is not used any longer. If you search my Movable Type Weblog, the Fulltext Search from MTLookup is used.

In the beginning the Fulltext Search has been an internal function within a Movable Type project. It read the database and built structures within MySQL.

With MTLookup, I do no longer read the database for accessing the entries' text. MTLookup is like one of the big Search Engine, which read websites via http. Even if my own Movable Type Weblog is indexed, the text is not read from the database. It is fetched with http get.

Nevertheless, I will release the code for using a Fulltext Search within one's own weblog. It will be based on my original Fulltext Search. So it will read the database directly, it will not use a Bot for spidering the website.

As I concentrated on improving MTLookup, the original Fulltext Search has not been touched for some time. I have to do some work on it, before it can be released.

Comment

Inaki | October 30, 2005 09:02 PM

Hi, Michael, thanks for your comment. If you need any help with the code before releasing it, feel free to contact me, I'll be glad to help with that.

Comment

Diskusne forum | September 5, 2006 05:29 AM

Interesting reading... and can you say me how big is the database? How many rows?