Before I merge this into the current distribution (and therefore mandate MySQL version 3.23.23 or better for Price Tapestry); it would be helpful if those using a large dataset and are running MySQL 3.23.23 or better would be able to try the following changes in order to add a full text index and see how much search speed is improved.
Firstly, you need to build the full text index against your products table. To do this; paste the following code into a file named "dbindex.php" in the top level directory of your Price Tapestry installation. Run the script to build the index, which will take about 1 second per 1000 records...
dbindex.php:
<?php
set_time_limit(0);
ignore_user_abort();
require("includes/common.php");
require("includes/database.php");
require("includes/header.php");
print "<p>Building FULLTEXT index on `products`...</p>";
$sql = "ALTER TABLE `".$config_databaseTablePrefix."products` ADD FULLTEXT INDEX (name)";
database_queryModify($sql,$rows);
print "<p>Done.</p>";
require("includes/footer.php");
?>
Next, you can use an updated version of the dbtest.php script from the Search Speed thread to test the full text index search speed:
dbtest.php:
<?php
require("includes/common.php");
require("includes/database.php");
require("includes/tapestry.php");
require("includes/header.php");
function timerBegin()
{
global $timeBegin;
list($usec, $sec) = explode(" ", microtime());
$timeBegin = ((float)$usec + (float)$sec);
}
function timerEnd()
{
global $timeBegin;
list($usec, $sec) = explode(" ", microtime());
$timeEnd = ((float)$usec + (float)$sec);
return ($timeEnd - $timeBegin);
}
$sql = "SELECT COUNT(*) AS product_count FROM `".$config_databaseTablePrefix."products`";
database_querySelect($sql,$rows);
$product_count = $rows[0]["product_count"];
$sql = "SELECT COUNT(*) AS merchant_count FROM `".$config_databaseTablePrefix."feeds`";
database_querySelect($sql,$rows);
$merchant_count = $rows[0]["merchant_count"];
$dbVersion = mysql_get_server_info();
print "<p>MySQL Server Version: ".$dbVersion."</p>";
print "<p>Merchants: ".$merchant_count."</p>";
print "<p>Products: ".$product_count."</p>";
print "<p>Testing SELECT on FULL TEXT index:<br />";
$sql = "SELECT * FROM `".$config_databaseTablePrefix."products` WHERE MATCH name AGAINST ('FOOBAR')";
timerBegin();
database_querySelect($sql,$rows);
print timerEnd()."</p>";
require("includes/footer.php");
?>
Please copy and paste the output from the test into this thread. Note that just building the full text index won't make any difference to the speed of Price Tapestry just yet as it requires a different SQL syntax! I'll post the necessary changes to search.php once we've got some feedback on the results of this test script.
Average sized site, fairly few products per merchant
MySQL Server Version: 4.1.12
Merchants: 26
Products: 51570
Testing SELECT on FULL TEXT index:
0.00039792060852051
Smaller, more products per merchant
MySQL Server Version: 4.1.12
Merchants: 11
Products: 46161
Testing SELECT on FULL TEXT index:
0.00035881996154785
Small site, small number of products
MySQL Server Version: 4.1.12
Merchants: 7
Products: 7518
Testing SELECT on FULL TEXT index:
0.00060319900512695
Medium site, more products per merchant
MySQL Server Version: 4.1.12
Merchants: 8
Products: 58222
Testing SELECT on FULL TEXT index:
0.00034284591674805
Large number of merchants, fairly few products per merchant
MySQL Server Version: 4.1.12
Merchants: 38
Products: 16897
Testing SELECT on FULL TEXT index:
0.016276121139526
mine is
MySQL Server Version: 3.23.56
Merchants: 70
Products: 145123
Testing SELECT on FULL TEXT index:
0.00082492828369141
And mine is
MySQL Server Version: 3.23.58
Merchants: 26
Products: 114201
Testing SELECT on FULL TEXT index:
0.12344288825989
Question: is this a good or bad result?
^ that's good - less than 2/10th's of a second on > 100,000 products. It is the difference of course between a full index scan, and a direct look-up into an index, so by rights it should be quick!
There are some caveats with full text indexing vs. the "old" method - i'll explain the situation later, but in the mean time i'll work now on getting these changes checked into the current distrubution so that search.php can use the full text index. I'll aim to have this available online first thing tomorrow (Sunday), and make MySQL version 3.23.23 the baseline for Price Tapestry.
Thanks for everyone's help!
David.
Full text indexing is now in the Current Distribution
Files modified: database.sql,search.php.
If you wish to upgrade to this version without re-importing all products; run the dbindex.php script (above) to build the full text index, and then extract search.php from the current distribution into your existing site. This change now mandates MySQL version 3.23.23 as the minimum required version for Price Tapestry.
Please note that full text indexing operates only on words of at least 4 characters. I have coded a strlen() check into the search.php that uses the full text index if the query is > 4 characters, or falls back on the original SQL for short queries.
just want to confirm that the Full Text index auto creates itself when items are added and removes, or do we have to do something to update the index?
There's no ongoing maintenance required - MySQL keeps the full text index up to date as records are added and removed just as it does for any "normal" index.
found a small bug
goto
http://www.shoppingchanneluk.com/search.php?q=t00
no record count, or next buttons
http://www.shoppingchanneluk.com/search.php?q=t008
^the addition of an 8 now there is a count and next options
oops..! Fixed that.
Please note that the full text index is only used if the query string is at least 4 characters. This is the default setting for creating the full text index in MySQL. If the query is shorter than 4 characters, the original SQL is used.
The bug that searley just picked up was because there is a test in search.php that decides which SQL to use based on the length of the query. In the section for less than 4 characters, the result count SQL was still using the full text WHERE clause; which wouldn't work!
In theory, it is possible to use a shorter minimum word length for the full text index, but this requires reconfiguration or (in some cases) re-complilation of MySQL. For MySQL >= 4.0.0, it can be configured within my.cnf. For more information, see the following page:
http://dev.mysql.com/doc/refman/4.1/en/fulltext-fine-tuning.html
If you do this; make the appropriate changes to search.php so that the correct SQL is used. Note that for merchant:, brand: and category: searches the original SQL is used because it is only possible for a database to use one index per query.
Yes - sorry, the fix is merged into the current distribution.
There is a side effect i dont like..
If i type t008 i get 25 reults
normally if i wanted less results i would try t008 cartridge, expecting anything with t008 AND cartridge to show
but this now show t008 OR cartridge which increases the results shown
Hi Searley,
It should be ordered by relevance; so anything with "t008 AND cartridge" will come up first; "t008" only next; and finally results with just "cartridge".
Do you not feel that this would provide more benefit overall; particularly where people may be incorrectly qualifying something or making typos?
For example, if somebody entered "SONY PLASMA TN0005", and you don't have the TN0005 in your database but you do have the TN0006 model it would still match "SONY PLASMA TN0006".
What do you think?
its not showing by relevance though
an example would be if i type 'LG Plasma'
item 1 'panasonic plasma'
item 2 'plasma ball'
item 3 'LG 42PX3RVA Plasma TV'
i would normally expect to see, the more description provided in the search, then the less results shown
you could have a fallback that if the search has 0 results returned then use the broader search as currently implemented!!
Ah - that's the word length restriction there unfortunately - full text does not index anything less than 4 characters (for performance reasons). So in your example, "LG" is not included in the search, otherwise the LG Plasma would have come up first on relevance.
It sounds like you want to force a boolean "AND", which is possible in MySQL > 4.0.1...
http://dev.mysql.com/doc/refman/4.1/en/fulltext-boolean.html
I think we need to work towards a "happy" medium that takes into account:
i) a full index scan is slow (unusable into the 100,000's products)
and
ii) full text indexing has limitations with word length.
There is another limitation with full text indexing. Using the old method; "TEST 1234" would match "TEST1234" and "TEST 1234", whereas this is not the case with the full text index, as they are treated as separate words. I think this is of particular relevance to product searching as it is not always known how manufacturers represent product model numbers...
OK i tested panasonic plasma, and yes there was the relevance sort..
I guess upon thought, that this is a better search than before, because if i type 'panasonic plasma' it will show the results with both those words in regardless of the being seperated by other text, where as before, it would have only shown them if the words were in the exact order searched..
now i understand what it is doing, i guess i can live with it :-)
but i think it is an improvement, although not perfect!
I agree - on balance it is definitely better - I like the way it works on WebPriceCheck.co.uk in terms of returning "similar" products if there is no exact match.
Unfortunately text search is a hugely complex area, and there's only so much you can do in terms of a script like Price Tapestry considering the limitations described above.
the main aim, is not to use PT really as a Affiliate Site, more as a shopping comparison, only really want affiliate feeds to 'pad' the results which is why i am keen to see development...
I think there's more we can do in terms of deciding what SQL to use, and perhaps performing a second query using the old method if there are no results returned by the full text query. This might help catch product model numbers like "LG1234" which might be in the database as either "LG 1234", "LG-1234" or "LG1234".
Been away for a bit so only just got on to this.
Anyway I get ....
MySQL Server Version: 4.1.14-standard
Merchants: 55
Products: 382480
Testing SELECT on FULL TEXT index:
Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/ukmp3s/public_html/shopsort/Shop/includes/database.php on line 21
Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/ukmp3s/public_html/shopsort/Shop/includes/database.php on line 26
0.071100950241089
Hi Eddie,
Sounds like the full text index hasn't been built.
How long did the first script take to run when it was building the index?
(should have been at least 6 minutes on 300K+ products)
Seem to take for ages.
I have emailed you the urls so you can have a look
Hi Eddie,
Seems to be a more fundamental issue at the moment with the product count query returning no results.
Do you have command line access to mysql?
sue do but wouldnt know how to use it
give me a clue
Hi Eddie,
I want to see the current index structure on your products table. To do this; do the following at the command prompt ($) of your server:
$mysql --user=username --password=password
mysql>use databasename
mysql>show index from products;
This should produce an output similar to this:
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment |
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+
| products | 0 | PRIMARY | 1 | id | A | 22461 | NULL | NULL | |
| products | 0 | dupe_filter | 1 | dupe_hash | A | 22461 | NULL | NULL | |
| products | 1 | merchant | 1 | merchant | A | NULL | NULL | NULL | |
| products | 1 | name | 1 | name | A | NULL | NULL | NULL | |
| products | 1 | search_name | 1 | search_name | A | NULL | NULL | NULL | |
| products | 1 | category | 1 | category | A | NULL | NULL | NULL | |
| products | 1 | brand | 1 | brand | A | NULL | NULL | NULL | |
| products | 1 | name_2 | 1 | name | A | NULL | NULL | NULL | FULLTEXT |
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+
8 rows in set (0.00 sec)
hi david,
is full text search currently included in the current distro?
full text search meaning the search includes the title and description?
thank you.
regards,
atman
Hi atman,
Full Text indexing has been in the distribution for some time now; however it is just on the product name (titles).
If you want to include the description you need to create another combined full text index on the name and description fields. You can manually in the MySQL command line tool as follows:
$mysql -u <username> -p
Enter Password: <password>
mysql> use <price-tapestry-database>
mysql> create fulltext index ft_namedescription on products (name,description);
mysql> quit
Bye.
$
If you cannot access your server to do this, you can write a script to do it instead:
makeIndex.php:
<?php
set_time_limit(0);
ignore_user_abort();
require("includes/common.php");
require("includes/database.php");
require("includes/header.php");
$sql = "create fulltext index ft_namedescription on products (name,description)";
database_queryModify($sql,$rows);
print "<p>Done.</p>";
require("includes/footer.php");
?>
Then, to make search.php use the combined index you need search for the following string:
MATCH name
...and replace it with:
MATCH (name,description)
(it occurs in the SQL construction on lines 65 and 67)
Cheers,
David.
david,
i tried the makeindex.php script btu it gave me errors.
i edited the script and added
require("includes/database.php");
my final code:
<?php
set_time_limit(0);
ignore_user_abort();
require("includes/common.php");
require("includes/database.php");
require("includes/header.php");
$sql = "create fulltext index ft_namedescription on products (name,description)";
database_queryModify($sql,$rows);
print "<p>Done.</p>";
require("includes/footer.php");
?>
works wonders!
search gives more results. :)
regards,
atman.
david,
is it possible to make search results show first the name(title) then the (description)?
the search keywords found on name(title) is ranked higher and thus given more priority and the keywords found on the (description) will be on the bottom of the list.
am i making any sense?
regards,
atman
Hi atman,
Yes - that does make sense. I'm not immediately sure how easily it could be done as the relevancy is determined by MySQL but i'll have a look into it for you.
Cheers,
David.
i get an error when running the dbindex.php file
Fatal error: Cannot redeclare database_queryselect() (previously declared in /websites/LinuxPackage02/ko/ol/ab/koolabula.co.uk/public_html/includes/database.php:3) in /websites/LinuxPackage02/ko/ol/ab/koolabula.co.uk/public_html/includes/database.php on line 27
Hi,
That script above is slightly incorrect, try this:
<?php
set_time_limit(0);
ignore_user_abort();
require("includes/common.php");
require("includes/header.php");
$sql = "create fulltext index ft_namedescription on products (name,description)";
database_queryModify($sql,$rows);
print "<p>Done.</p>";
require("includes/footer.php");
?>
(it doesn't need to have require("includes/database.php")!!!)
Cheers,
David.
i dont have a header.php or footer.php file in includes. Should this be html/ instead?
cheers
It should... yes!
<?php
set_time_limit(0);
ignore_user_abort();
require("includes/common.php");
require("html/header.php");
$sql = "create fulltext index ft_namedescription on products (name,description)";
database_queryModify($sql,$rows);
print "<p>Done.</p>";
require("html/footer.php");
?>
This is to add the description search - full text indexing has been part of the Price Tapestry distribtion for some time now...
Cheers,
David.
MySQL Server Version: 3.23.58
Merchants: 6
Products: 19473
Testing SELECT on FULL TEXT index:
0.0013010501861572
(from my dev server)