You are here:  » import speed problems

Support Forum



import speed problems

Submitted by davidn on Tue, 2009-11-24 11:10 in

Hi,
I was curious whether someone here also has some import-speeding problems and how does anyone solve them?

I run a website with approximately 1.200.000 products from 300 shops at this time and still counting (that is why I am asking) on a dedicated server (just for this web) Xeon X3360 (QuadCore, 2,83G), 8GB RAM, 2x 750GB SATA disk (RAID mirroring), about 5000 product mapping rules (dynamic keyword based), about 500 target categories (with mapping set up) and the import @ALL takes about 6 hours to complete its work (so about 200.000 products per hour). During the import the server load raises to approx. 1.0-1.5 but that seems to be ok.

So I was wondering whether the speed is adequate or how (where) can I make it faster, I don't think it is a HW problem but it can be a PHP/Mysql/Apache settings problem. I checked out all the topics in this forum and tried to optimize MySQL (my.cnf from http://www.pricetapestry.com/node/732), but the speed is still the same.

I was thinking for example to create an SQL file from the feeds instead of immediate importing and that one import to the DB quicker or something like that.

I am afraid that if I keep the site growing, with this speed there is a limit of approx. 3-4M products.

Thanks for any advice, I'll post any advices that I think up.

Submitted by support on Tue, 2009-11-24 11:18

Hi David,

For a site with that many products; I would recommend having a look at the import to temporary table modification described here:

http://www.pricetapestry.com/node/2554#comment-10312

With this method; when you import with @ALL, am empty copy of the products table is created and the feeds imported into that table; which means that your site is not offline for the duration of the main import process. When the import has completed the original products table is dropped (very quick) and then the temporary table renamed to `products` (also very quick).

I've you're not sure about any of the changes described; if you could email me your includes/admin.php and scripts/import.php i'll make the changes for you.

Cheers,
David.

Submitted by davidn on Tue, 2009-11-24 11:39

Hi, I am using the temporary table modification, but it still takes about 6 hours to fill the temporary table (and of course to drop the old one a rename the one).

So it is ok that visitors can still work with all the products, I am just worried about the future, you know... 20hours long imports... with just about 3M products:-/

Submitted by support on Tue, 2009-11-24 11:43

Hi David,

The only process that may have an impact (other than raw processing power - it is always going to take some time for that many products to be imported) would be the duplicate processing. During import, an MD5 hash is created from (merchant name + product name); which is a unique key; so as the number of products increases this may have an impact. I'll do some tests to see how significant that is and it may be possible to work this some other way for such a large site...

Cheers,
David.

Submitted by Andrewc on Fri, 2009-11-27 06:38

Excellent to hear your working on improving for large database.
I'm headed in that direction..

Submitted by davidn on Tue, 2010-03-02 22:20

Hi David, any news on this? Thanks.

Submitted by support on Wed, 2010-03-03 09:52

Hi David,

Had you come across the following thread regarding MySQL server configuration for very large sites...

http://www.pricetapestry.com/node/732

Ultimately, whilst Price Tapestry doesn't impose any actual limit on database size; ultimately; there will be an upper boundary that reaches the capabilities of the hosting environment (processor speed, disk space etc.); so I'm not sure I'd recommend trying to approach 3M+ products in a single installation. Instead; I would strongly suggest going for a multi-installation approach, such as that described in the following thread; using sub-directories for master level categories for example. This has the added benefit of creating a sort of category hierarchy; where the category field within each installation effectively becomes a second-tier category...

http://www.pricetapestry.com/node/205

Cheers,
David.