Hi,
I have a pricetapestry installation with some 70 datafeeds (it's my central server).
I implemented the temporary table mod:
http://www.pricetapestry.com/node/3355
My questions are:
1. Should I seperate the fetch and import parts in seperate cronjobs? I mean does it matter ? I guess having the fetch and import in one cron job ensures that they are done sequentially with a minimun time lag between them? So what is the benefit of seperating them?
2. Should I use import.php @all of import.php @modified ? I guess @modified is better (faster) without the temporary table mod but what is better with the temporary table mod?
Regarding import.php @all. I still had it on @modified.
I'm loading the datafeeds in several batches on different times.
e.g. one batch has 40 feeds and another batch has 70 feeds and another one has 3 feeds.
Is it than still better to use @all for the large batches and use the merchant specific (php import.php ExampleMerchant.xml) for the small batch of feeds?
Hello Marco,
Yes that would make sense - use @ALL for large batches, and @MODIFIED or merchant specific after the smaller jobs.
Cheers,
David.
--
PriceTapestry.com
Hi,
I'm wondering how the import.php works. Does it look up all the datafeeds in the feeds folder?
I'm having a master website for central download of datafeeds and want to load some big datafeeds (4 feeds that combined have 1.4GB data). They are not registered at the website but are fetched so that I can import them at another website.
I'm using import.php @all but would that imply that it 'goes over' the 'not-registered' feeds too?
Hi Marco,
That's correct - import.php reads the `feeds` table in the database so it only applies to registered feeds.
Cheers,
David.
--
PriceTapestry.com
Hi,
Wanted to check an import scenario with you?
I suspect that some big datafeeds are upsetting my server (i got out of memory messages).
I implemented the temporary mod and I'm importing some 24 datafeeds with 115K products imported.
From the datafeeds 5 are rather big. They total 1,3GB and one feeds is almost 500MB.
I use import.php@all but I noticed my server went on several days that my server went down something like 10 minutes after this cron job.
a. Could it be that the import is causing the out of memory problems on my server?
b. Should I use the modified import.php (temporary mod) or not?
c. Should I use @modified instead as the feeds are not refreshed daily?
Hi Marco,
The script is designed such that memory usage is only really proportional with the size of a single product record - there shouldn't be any correlation with feed size.
Further, the way the parse works, by building up an associated array of field names it is very unlikely that even a corrupted feed could cause the parser to consume an excessive amount of memory.
In general, I would always recommend @ALL over @MODIFIED as @MODIFIED requires a DELETE FROM operation WHERE filename='...', which can be very slow against a very large table.
So, I would recommend continuing with @ALL against the zero down-time mod, and also double check that disk space is not a factor of course...
Cheers,
David.
--
PriceTapestry.com
Hi Marco,
Re: 1;
Given that the total download time is not known, it is best to have fetch / import in the _same_ script, for the reason you mentioned - that will result in the shortest delay between fetch and import - otherwise you would have to allow a suitable window to ensure that the download had finished - and even then you can never really be sure - so best together!
Re: 2;
The temporary table mod only applies to import.php @ALL, so again best to use that (as I believe you are) - and especially with a large site and many feeds this will ensure that your site does not appear to be offline at all during the whole fetch / import process.
Cheers,
David.
--
PriceTapestry.com