You are here:  » Server load & Importing


Server load & Importing

Submitted by clare on Mon, 2011-04-04 16:26 in

I was wondering whether the file size of a datafeed affects the load on the server when doing an import.

Does the script read the whole file, keep it in memory and process it, so the bigger the file, the bigger the load on the cpu?

I was just wondering if cutting large feeds up into smaller files would reduce the load?

Another question I had is with regards to import @ALL, if files are imported individually, with the import script being called once for each file, would that keep the load on the server down more than calling the script once and importing all at the same time?

Submitted by support on Mon, 2011-04-04 16:43

Hello Clare,

Provided that feeds are well formatted, the size of the feed should not affect server load - as the parser only loads each record at a time, processes it and discards it from memory. Of course if a feed is broken, for example the closing tag of a product record in an XML feed is missing then it could result in an "out of memory" error.

The lowest server load with regards to importing is importing @ALL, as using @MODIFIED requires a DELETE against a large index (for a large site), but the best of both worlds is the zero down-time mod as described in this thread which imports to a temporary table and then swaps the temporary table over the live table on completion...!

Cheers,
David.
--
PriceTapestry.com

Submitted by clare on Mon, 2011-04-04 17:42

Thanks for that explaining that. I had found the temporary table mod, that was what had brought these question to mind, as at first I had all sorts of problems, but that was only my mistake as I was using the old code, instead of the updated code for the latest version of PT.

Now I have put that code, plus a 1 second sleep every 100 products, into the admin, it is really importing very nicely, the server load staying very reasonable.

I was also wondering which would use less resources, copying products from one table to another, or importing a feed.

Reason I ask is because I was thinking I could just import the modified feeds into a temp table, and then copy across all the products from feeds that have not been modified to the temp table and rename, like your mod. But I am not sure if it would be of any benefit, as the copying might use just as much resources as importing?

Submitted by support on Mon, 2011-04-04 19:17

Hi Clare,

Your thinking is correct - an import them copy is load equivalent to importing @MODIFIED as the copy process would have to DELETE anyway, and for large product tables it is that DELETE process than can take a significant proportion of the import time - so I don't think it would gain anything i'm afraid!

Cheers,
David.
--
PriceTapestry.com

Submitted by marco@flapper on Mon, 2011-04-04 22:03

ok, I understand that @all is beter than @modified with large sites. But what is a large site? For how many products is it better to use @all instead of @modified?

Submitted by support on Tue, 2011-04-05 06:44

Hi Marco,

I would actually recommend @ALL for all sites regardless of size, in particular in conjunction with the zero down-time mod mentioned above. The performance issues associated with using @MODIFIED are a result of the database having to DELETE all products for the filename being imported, resulting in a fragmented products table if you can imagine that (just like a hard disk becomes fragmented if you delete files).

Using @ALL empties the product table and imports all feeds from scratch, so the product table will be completely continuous resulting in faster search times etc.

Cheers,
David.
--
PriceTapestry.com

Submitted by clare on Tue, 2011-04-05 06:54

Thanks David