Hello,
I was updating a site today by running the import command on the command line of the server:
/usr/bin/php /var/www/vhosts/xxx.com/httpdocs/scripts/import.php @MODIFIED
I had also downloaded a load of new feeds, so one by one I was registering them as the above process was running. After a while all the websites on my server stopped responding and I had to call the hosting company to ask them to stop httpd and restart apache to get them all working again.
The feedback from the hosting is below. I offer this for comment as it's beyond my technical knowledge but perhaps it could be used to improve the import function, or do you think it's not relevant?
Regards,
Ben
P.S. It's been a while since I've been in the forum so maybe I've missed other posts that have already covered this.
-----------------feedback begins------------------------
Hi Ben,
As per conversation, your server was running very slow.
You had run TOP, and saw that Apache was taking a lot of CPU, which I directed you to stop it.
Once that was stopped, some of the load freed up, so I was able to log in, and saw that MySQL was taking alot of CPU. I discoverd alot of the following running at the same time:
| 1301447 | dbname | localhost | WWOPTDB | Query | 695 | Copying to tmp table | SELECT * , MIN( price ) AS minPrice, MAX( price ) AS maxPrice, COUNT( id ) AS numMerchants, MATCH name AGAINST ('John Rocha Large Single Heart Pendant set with 4 Diamonds') AS relevance FROM `products` WHERE category = 'Womens Jewellery' AND MATCH name AGAINST ('John Rocha Large Single Heart Pendant set with 4 Diamonds') AND name <> 'John Rocha Large Single Heart Pendant set with 4 Diamonds' GROUP BY name ORDER BY relevance LIMIT 3
I used the explain statement on this:
mysql> explain SELECT * , MIN( price ) AS minPrice, MAX( price ) AS maxPrice, COUNT( id ) AS numMerchants, MATCH name AGAINST ('John Rocha Large Single Heart Pendant set with 4 Diamonds') AS relevance FROM `products` WHERE category = 'Womens Jewellery' AND MATCH name AGAINST ('John Rocha Large Single Heart Pendant set with 4 Diamonds') AND name <> 'John Rocha Large Single Heart Pendant set with 4 Diamonds' GROUP BY name ORDER BY relevance LIMIT 3;
+----+-------------+----------+----------+----------------------+--------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+----------+----------------------+--------+---------+------+------+----------------------------------------------+
| 1 | SIMPLE | products | fulltext | name,category,name_2 | name_2 | 0 | | 1 | Using where; Using temporary; Using filesort |
+----+-------------+----------+----------+----------------------+--------+---------+------+------+----------------------------------------------+
1 row in set (0.00 sec)
AS you can see, this query is a FULLTEXT type, and is using the slowest sort methods, and forced to use a disk based temporary table.
I checked the table this query is running against:
mysql> desc products;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| id | int(11) | | PRI | NULL | auto_increment |
| merchant | varchar(255) | | MUL | | |
| name | varchar(255) | | MUL | | |
| description | text | | | | |
| image_url | varchar(255) | | | | |
| buy_url | text | | | | |
| price | decimal(10,2) | | | 0.00 | |
| search_name | varchar(255) | | MUL | | |
| category | varchar(255) | | MUL | | |
| brand | varchar(255) | | MUL | | |
| dupe_hash | varchar(32) | | UNI | | |
| rating | int(11) | | | 0 | |
| reviews | int(11) | | | 0 | |
+-------------+---------------+------+-----+---------+----------------+
13 rows in set (0.00 sec)
Both description and buy_url are text datatype, and if you do any sorting with them, will always result in a disk based temporary table.
The reccomended way is to use a seperate table with foriegn key or similar method, where you dont have to sort by the TEXT fields, and just pull them out with a seperate query.
-------------------------feedback ends------------------------------------
Hello David,
Thanks for the prompt response.
If I've understood you correctly then, the irony is that it's best to import all the feeds rather than the modified ones?
Presumably it'll run for longer but won't take up as much CPU?
Regards,
Ben
Hi Ben,
That's correct.
It's strange because it's an indexed field, and there's no sorting involved; but it's just the shear number of rows that need to be deleted, meaning that lots of pointers need to be updated that makes it take so long.
I think there's a side-benefit in that using TRUNCATE cleans the table completely, so the resuling full import should be more efficient as there will be no wasted space in the table.
Cheers,
David.
Hello Ben,
For a very large site; what can be an issue when using @MODIFIED is the selective delete
as this issues a DELETE FROM products WHERE merchant='Merchant' query, which can take a
long time (and is most likely the cause of 100% CPU usage) on very large databases.
Much faster, if the majority of feeds have been updated; would be to use @ALL as this
call TRUNCATEs the product table resulting in a much faster query.
I agree with the comments made by your host regarding the slow sort method, and at
no point does the script attempt to sort by description or buy_url (the TEXT fields),
so I don't believe that this would be the cause of the excessive usage - although it
would certainly contribute if you were experiencing a lot of traffic...
Cheers,
David.