So you've finished two-thirds of an application when your project manager comes to you with a great idea (soon to be requirement), "I was thinking, it would be nice if we could do Boolean fulltext searches... what do you think?" Now, if the content data is stored in a MySQL database and accessed from a PHP framework, here are a few canned responses to choose from:
Unfortunately, most of us have settled on like variations of the second response; this is functionality that MySQL has promised with its next full version release. The fact remains, the more web savvy users have come to expect a few options when searching:
Support for such options are doubly important when we are dealing with large datasets like knowledge bases and news sites. Fortunately, these are two areas in which MySQL already excels, just not both at the same time. However, there is at least one PHP hack for those of us that can't wait for the 4.0 release.
The functions (will open in a new window):
funcs.mysql.boolean.php
An example implementation complete with sample data:
example.mysql.boolean.php
I've also setup a script that lets you test drive this functionality on a small sample database: form.mysql.boolean.php
The provided functions are mainly used to generate SQL elements of the SELECT and WHERE clauses as follows. First, we must retrieve the FULLTEXT KEY column names from our table, $table_name, via our database connection, $db_connect:
$fulltext_key = get_fulltext_key($table_name,$db_connect);
And then we build the SQL statement using $fulltext_key and the user input, $search_string:
$sql = "SELECT id, author, content, \n"
.boolean_sql_select(
boolean_inclusive_atoms($search_string),
$fulltext_key)." \n"
."as relevance \n"
."FROM $table_name \n"
."WHERE \n"
.boolean_sql_where($search_string,$fulltext_key)." \n"
."HAVING relevance>0 \n"
."ORDER BY relevance DESC \n";
ANDORNOT()ANDORNOTAND operator has precedence over the OR operator.myisam/ftdefs.h file:#define MIN_WORD_LEN 4SELECT clause will switch to a simple scoring algorithm that does a case insensitive count of the subject word in all of the FULLTEXT KEY columns; the SQL element in the WHERE clause will switch to a LIKE string comparison of the subject word with all of the FULLTEXT KEY columns. Note that enabling this additional functionality may create some performance issues with larger datasets as it runs significantly slower than the fulltext system used for words of 4+ characters, perhaps because it does not use a compiled algorithm or the fulltext index.
atom1 AND atom2atom1 OR atom2atom1 NOT atom2atom1 (atom2,atom3) -atom4(atom1,atom2) -(atom1 atom2) [construct for atom1 XOR atom2]atom1 (atom2,(atom3 -atom4)) -atom5WHERE Clausetitle,content indexed on the FULLTEXT KEY, the typical syntax for implementing a fulltext search on said table might be:MATCH (title,content) AGAINST ('george')george, the above will return a floating point number, typically between 0.0 (exclusive) and 5.0 for any given record. If the fulltext index does not contain the word george, the above will return 0.0 for any given record. Thus, we can make the above statement evaluate to TRUE or FALSE with the following syntax:MATCH (title,content) AGAINST ('george') > 0george AND dubyaWHERE clause will be
MATCH (title,content) AGAINST ('george') > 0
AND
MATCH (title,content) AGAINST ('dubya') > 0
SELECT ClauseSELECT clause can be used to calculate the total relevance as declared by the relevance alias for the above example like so:
MATCH (title,content) AGAINST ('george')
+
MATCH (title,content) AGAINST ('dubya')
as relevance
but the code utilizes a simplification which produces a numerical equivalent:
MATCH (title,content) AGAINST ('george dubya')
as relevance
HAVING ClauseHAVING clause which will ensure that the query only returns records with a non-zero relevance:relevance > 0WHERE clause which already deals which the inclusion and exclusion of records, this allows the developer to impose a minimum relevance restriction on the result set.relevance > 0.2ORDER BY Clauserelevance DESCKeep in mind that some of the parameters in these functions are arbitrary or may be specific to the context in which they are used. However, I do encourage you to email me with any general comments or improvements.
-David Altherr altherda@email.uc.edu davidaltherr.net