You are here:  » Full Text Index Trial


Full Text Index Trial

Submitted by support on Sat, 2006-02-25 09:49 in

Before I merge this into the current distribution (and therefore mandate MySQL version 3.23.23 or better for Price Tapestry); it would be helpful if those using a large dataset and are running MySQL 3.23.23 or better would be able to try the following changes in order to add a full text index and see how much search speed is improved.

Firstly, you need to build the full text index against your products table. To do this; paste the following code into a file named "dbindex.php" in the top level directory of your Price Tapestry installation. Run the script to build the index, which will take about 1 second per 1000 records...

dbindex.php:

<?php
  set_time_limit
(0);
  
ignore_user_abort();
  require(
"includes/common.php");
  require(
"includes/database.php");
  require(
"includes/header.php");
  print 
"<p>Building FULLTEXT index on `products`...</p>";
  
$sql "ALTER TABLE `".$config_databaseTablePrefix."products` ADD FULLTEXT INDEX (name)";          
  
database_queryModify($sql,$rows);
  print 
"<p>Done.</p>";
  require(
"includes/footer.php");
?>

Next, you can use an updated version of the dbtest.php script from the Search Speed thread to test the full text index search speed:

dbtest.php:

<?php
  
require("includes/common.php");
  require(
"includes/database.php");
  require(
"includes/tapestry.php");
  require(
"includes/header.php");
  function 
timerBegin()
  {
    global 
$timeBegin;
    list(
$usec$sec) = explode(" "microtime());
    
$timeBegin = ((float)$usec + (float)$sec);
  }
  function 
timerEnd()
  {
    global 
$timeBegin;
    list(
$usec$sec) = explode(" "microtime());
    
$timeEnd = ((float)$usec + (float)$sec);
    return (
$timeEnd $timeBegin);
  }
  
$sql "SELECT COUNT(*) AS product_count FROM `".$config_databaseTablePrefix."products`";
  
database_querySelect($sql,$rows);
  
$product_count $rows[0]["product_count"];
  
$sql "SELECT COUNT(*) AS merchant_count FROM `".$config_databaseTablePrefix."feeds`";
  
database_querySelect($sql,$rows);
  
$merchant_count $rows[0]["merchant_count"];
  
$dbVersion mysql_get_server_info();
  print 
"<p>MySQL Server Version: ".$dbVersion."</p>";
  print 
"<p>Merchants: ".$merchant_count."</p>";
  print 
"<p>Products: ".$product_count."</p>";
  print 
"<p>Testing SELECT on FULL TEXT index:<br />";
  
$sql "SELECT * FROM `".$config_databaseTablePrefix."products` WHERE MATCH name AGAINST ('FOOBAR')";
  
timerBegin();
  
database_querySelect($sql,$rows);
  print 
timerEnd()."</p>";
  require(
"includes/footer.php");
?>

Please copy and paste the output from the test into this thread. Note that just building the full text index won't make any difference to the speed of Price Tapestry just yet as it requires a different SQL syntax! I'll post the necessary changes to search.php once we've got some feedback on the results of this test script.

Submitted by support on Sat, 2006-02-25 10:10

MySQL Server Version: 3.23.58
Merchants: 6
Products: 19473
Testing SELECT on FULL TEXT index:
0.0013010501861572

(from my dev server)

Submitted by madstock on Sat, 2006-02-25 11:15

Average sized site, fairly few products per merchant
MySQL Server Version: 4.1.12
Merchants: 26
Products: 51570
Testing SELECT on FULL TEXT index:
0.00039792060852051

Smaller, more products per merchant
MySQL Server Version: 4.1.12
Merchants: 11
Products: 46161
Testing SELECT on FULL TEXT index:
0.00035881996154785

Small site, small number of products
MySQL Server Version: 4.1.12
Merchants: 7
Products: 7518
Testing SELECT on FULL TEXT index:
0.00060319900512695

Medium site, more products per merchant
MySQL Server Version: 4.1.12
Merchants: 8
Products: 58222
Testing SELECT on FULL TEXT index:
0.00034284591674805

Large number of merchants, fairly few products per merchant
MySQL Server Version: 4.1.12
Merchants: 38
Products: 16897
Testing SELECT on FULL TEXT index:
0.016276121139526

Submitted by searley on Sat, 2006-02-25 13:25

mine is

MySQL Server Version: 3.23.56
Merchants: 70
Products: 145123
Testing SELECT on FULL TEXT index:
0.00082492828369141

Submitted by IG on Sat, 2006-02-25 14:48

And mine is

MySQL Server Version: 3.23.58
Merchants: 26
Products: 114201
Testing SELECT on FULL TEXT index:
0.12344288825989

Question: is this a good or bad result?

Submitted by support on Sat, 2006-02-25 18:01

^ that's good - less than 2/10th's of a second on > 100,000 products. It is the difference of course between a full index scan, and a direct look-up into an index, so by rights it should be quick!

There are some caveats with full text indexing vs. the "old" method - i'll explain the situation later, but in the mean time i'll work now on getting these changes checked into the current distrubution so that search.php can use the full text index. I'll aim to have this available online first thing tomorrow (Sunday), and make MySQL version 3.23.23 the baseline for Price Tapestry.

Thanks for everyone's help!
David.

Submitted by support on Sun, 2006-02-26 13:22

Full text indexing is now in the Current Distribution

Files modified: database.sql,search.php.

If you wish to upgrade to this version without re-importing all products; run the dbindex.php script (above) to build the full text index, and then extract search.php from the current distribution into your existing site. This change now mandates MySQL version 3.23.23 as the minimum required version for Price Tapestry.

Please note that full text indexing operates only on words of at least 4 characters. I have coded a strlen() check into the search.php that uses the full text index if the query is > 4 characters, or falls back on the original SQL for short queries.

Submitted by searley on Sun, 2006-02-26 14:01

just want to confirm that the Full Text index auto creates itself when items are added and removes, or do we have to do something to update the index?

Submitted by support on Sun, 2006-02-26 14:03

There's no ongoing maintenance required - MySQL keeps the full text index up to date as records are added and removed just as it does for any "normal" index.

Submitted by searley on Sun, 2006-02-26 14:05

found a small bug

goto

http://www.shoppingchanneluk.com/search.php?q=t00

no record count, or next buttons

http://www.shoppingchanneluk.com/search.php?q=t008

^the addition of an 8 now there is a count and next options

Submitted by support on Sun, 2006-02-26 14:09

oops..! Fixed that.

Please note that the full text index is only used if the query string is at least 4 characters. This is the default setting for creating the full text index in MySQL. If the query is shorter than 4 characters, the original SQL is used.

The bug that searley just picked up was because there is a test in search.php that decides which SQL to use based on the length of the query. In the section for less than 4 characters, the result count SQL was still using the full text WHERE clause; which wouldn't work!

In theory, it is possible to use a shorter minimum word length for the full text index, but this requires reconfiguration or (in some cases) re-complilation of MySQL. For MySQL >= 4.0.0, it can be configured within my.cnf. For more information, see the following page:

http://dev.mysql.com/doc/refman/4.1/en/fulltext-fine-tuning.html

If you do this; make the appropriate changes to search.php so that the correct SQL is used. Note that for merchant:, brand: and category: searches the original SQL is used because it is only possible for a database to use one index per query.

Submitted by searley on Sun, 2006-02-26 14:10

is it in the distribution to re-download?

Submitted by support on Sun, 2006-02-26 14:15

Yes - sorry, the fix is merged into the current distribution.

Submitted by searley on Sun, 2006-02-26 14:20

There is a side effect i dont like..

If i type t008 i get 25 reults

normally if i wanted less results i would try t008 cartridge, expecting anything with t008 AND cartridge to show

but this now show t008 OR cartridge which increases the results shown

Submitted by support on Sun, 2006-02-26 14:25

Hi Searley,

It should be ordered by relevance; so anything with "t008 AND cartridge" will come up first; "t008" only next; and finally results with just "cartridge".

Do you not feel that this would provide more benefit overall; particularly where people may be incorrectly qualifying something or making typos?

For example, if somebody entered "SONY PLASMA TN0005", and you don't have the TN0005 in your database but you do have the TN0006 model it would still match "SONY PLASMA TN0006".

What do you think?

Submitted by searley on Sun, 2006-02-26 14:36

its not showing by relevance though

an example would be if i type 'LG Plasma'

item 1 'panasonic plasma'
item 2 'plasma ball'
item 3 'LG 42PX3RVA Plasma TV'

i would normally expect to see, the more description provided in the search, then the less results shown

you could have a fallback that if the search has 0 results returned then use the broader search as currently implemented!!

Submitted by support on Sun, 2006-02-26 14:40

Ah - that's the word length restriction there unfortunately - full text does not index anything less than 4 characters (for performance reasons). So in your example, "LG" is not included in the search, otherwise the LG Plasma would have come up first on relevance.

It sounds like you want to force a boolean "AND", which is possible in MySQL > 4.0.1...

http://dev.mysql.com/doc/refman/4.1/en/fulltext-boolean.html

I think we need to work towards a "happy" medium that takes into account:

i) a full index scan is slow (unusable into the 100,000's products)

and

ii) full text indexing has limitations with word length.

There is another limitation with full text indexing. Using the old method; "TEST 1234" would match "TEST1234" and "TEST 1234", whereas this is not the case with the full text index, as they are treated as separate words. I think this is of particular relevance to product searching as it is not always known how manufacturers represent product model numbers...

Submitted by searley on Sun, 2006-02-26 14:52

OK i tested panasonic plasma, and yes there was the relevance sort..

I guess upon thought, that this is a better search than before, because if i type 'panasonic plasma' it will show the results with both those words in regardless of the being seperated by other text, where as before, it would have only shown them if the words were in the exact order searched..

now i understand what it is doing, i guess i can live with it :-)

but i think it is an improvement, although not perfect!

Submitted by support on Sun, 2006-02-26 15:00

I agree - on balance it is definitely better - I like the way it works on WebPriceCheck.co.uk in terms of returning "similar" products if there is no exact match.

Unfortunately text search is a hugely complex area, and there's only so much you can do in terms of a script like Price Tapestry considering the limitations described above.

Submitted by searley on Sun, 2006-02-26 15:08

the main aim, is not to use PT really as a Affiliate Site, more as a shopping comparison, only really want affiliate feeds to 'pad' the results which is why i am keen to see development...

Submitted by support on Sun, 2006-02-26 15:12

I think there's more we can do in terms of deciding what SQL to use, and perhaps performing a second query using the old method if there are no results returned by the full text query. This might help catch product model numbers like "LG1234" which might be in the database as either "LG 1234", "LG-1234" or "LG1234".

Submitted by Eddie on Tue, 2006-03-07 11:33

Been away for a bit so only just got on to this.

Anyway I get ....

MySQL Server Version: 4.1.14-standard

Merchants: 55

Products: 382480

Testing SELECT on FULL TEXT index:

Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/ukmp3s/public_html/shopsort/Shop/includes/database.php on line 21

Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/ukmp3s/public_html/shopsort/Shop/includes/database.php on line 26
0.071100950241089

Submitted by support on Tue, 2006-03-07 11:46

Hi Eddie,

Sounds like the full text index hasn't been built.

How long did the first script take to run when it was building the index?
(should have been at least 6 minutes on 300K+ products)

Submitted by Eddie on Sun, 2006-03-12 11:03

Seem to take for ages.

I have emailed you the urls so you can have a look

Submitted by support on Sun, 2006-03-12 12:02

Hi Eddie,

Seems to be a more fundamental issue at the moment with the product count query returning no results.

Do you have command line access to mysql?

Submitted by Eddie on Tue, 2006-03-14 18:11

sue do but wouldnt know how to use it

give me a clue

Submitted by support on Tue, 2006-03-14 18:42

Hi Eddie,

I want to see the current index structure on your products table. To do this; do the following at the command prompt ($) of your server:

$mysql --user=username --password=password
mysql>use databasename
mysql>show index from products;

This should produce an output similar to this:

+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment |
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+
| products | 0 | PRIMARY | 1 | id | A | 22461 | NULL | NULL | |
| products | 0 | dupe_filter | 1 | dupe_hash | A | 22461 | NULL | NULL | |
| products | 1 | merchant | 1 | merchant | A | NULL | NULL | NULL | |
| products | 1 | name | 1 | name | A | NULL | NULL | NULL | |
| products | 1 | search_name | 1 | search_name | A | NULL | NULL | NULL | |
| products | 1 | category | 1 | category | A | NULL | NULL | NULL | |
| products | 1 | brand | 1 | brand | A | NULL | NULL | NULL | |
| products | 1 | name_2 | 1 | name | A | NULL | NULL | NULL | FULLTEXT |
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+
8 rows in set (0.00 sec)

Submitted by atman on Wed, 2006-09-20 05:26

hi david,

is full text search currently included in the current distro?

full text search meaning the search includes the title and description?

thank you.

regards,

atman

Submitted by support on Wed, 2006-09-20 07:01

Hi atman,

Full Text indexing has been in the distribution for some time now; however it is just on the product name (titles).

If you want to include the description you need to create another combined full text index on the name and description fields. You can manually in the MySQL command line tool as follows:

$mysql -u <username> -p
Enter Password: <password>
mysql> use <price-tapestry-database>
mysql> create fulltext index ft_namedescription on products (name,description);
mysql> quit
Bye.
$

If you cannot access your server to do this, you can write a script to do it instead:

makeIndex.php:

<?php
  set_time_limit
(0);
  
ignore_user_abort();
  require(
"includes/common.php");
  require(
"includes/database.php");
  require(
"includes/header.php");
  
$sql "create fulltext index ft_namedescription on products (name,description)";
  
database_queryModify($sql,$rows);
  print 
"<p>Done.</p>";
  require(
"includes/footer.php");
?>

(may takes some time if you already have hundreds of products in your database).

Then, to make search.php use the combined index you need search for the following string:

MATCH name

...and replace it with:

MATCH (name,description)

(it occurs in the SQL construction on lines 65 and 67)

Cheers,
David.

Submitted by atman on Wed, 2006-09-20 09:35

david,

i tried the makeindex.php script btu it gave me errors.

i edited the script and added

require("includes/database.php");
my final code:

<?php
  set_time_limit(0);
  ignore_user_abort();
  require("includes/common.php");
  require("includes/database.php");
  require("includes/header.php");
  $sql = "create fulltext index ft_namedescription on products (name,description)";
  database_queryModify($sql,$rows);
  print "<p>Done.</p>";
  require("includes/footer.php");
?>

works wonders!

search gives more results. :)

regards,

atman.

Submitted by atman on Wed, 2006-09-20 09:44

david,

is it possible to make search results show first the name(title) then the (description)?

the search keywords found on name(title) is ranked higher and thus given more priority and the keywords found on the (description) will be on the bottom of the list.

am i making any sense?

regards,

atman

Submitted by support on Wed, 2006-09-20 16:48

Hi atman,

Yes - that does make sense. I'm not immediately sure how easily it could be done as the relevancy is determined by MySQL but i'll have a look into it for you.

Cheers,
David.

Submitted by badger on Thu, 2006-10-05 22:02

i get an error when running the dbindex.php file

Fatal error: Cannot redeclare database_queryselect() (previously declared in /websites/LinuxPackage02/ko/ol/ab/koolabula.co.uk/public_html/includes/database.php:3) in /websites/LinuxPackage02/ko/ol/ab/koolabula.co.uk/public_html/includes/database.php on line 27

Submitted by support on Fri, 2006-10-06 09:47

Hi,

That script above is slightly incorrect, try this:

<?php
  set_time_limit
(0);
  
ignore_user_abort();
  require(
"includes/common.php");
  require(
"includes/header.php");
  
$sql "create fulltext index ft_namedescription on products (name,description)";
  
database_queryModify($sql,$rows);
  print 
"<p>Done.</p>";
  require(
"includes/footer.php");
?>

(it doesn't need to have require("includes/database.php")!!!)

Cheers,
David.

Submitted by badger on Sat, 2006-10-07 08:01

i dont have a header.php or footer.php file in includes. Should this be html/ instead?

cheers

Submitted by support on Sat, 2006-10-07 08:18

It should... yes!

<?php
  set_time_limit
(0);
  
ignore_user_abort();
  require(
"includes/common.php");
  require(
"html/header.php");
  
$sql "create fulltext index ft_namedescription on products (name,description)";
  
database_queryModify($sql,$rows);
  print 
"<p>Done.</p>";
  require(
"html/footer.php");
?>

This is to add the description search - full text indexing has been part of the Price Tapestry distribtion for some time now...

Cheers,
David.