You are here:  » big import


big import

Submitted by shams995 on Wed, 2013-05-15 18:23 in

Hi,

I have a question regarding really big datafeeds. Some of the feeds I am getting are 70mb (crazy! i know) and quite understandably the import errors out. My question is how can I work around this? This can get pretty difficult once the imports get automated. I have been told by my hosting guys that the timeout can be changed as it is shared environment.

Thanks
Shams

Submitted by support on Thu, 2013-05-16 09:02

Hello Shams;

If you have command line access (SSH) to your account the best way to import very large feeds is with scripts/import.php so you would change directory to the script folder, e.g.

cd public_html/scripts/

...and then to import a single feed:

php import.php <filename>

...or to import all feeds:

php import.php @ALL

If you don't then you can use the Slow Import Tool from the /admin/ area which works around any time limitations set by your host by importing in blocks and refreshing the browser window to continue the next chunk.

That latest version which I think you will be running already has a block size of 500 and sleep period of zero so it's not significantly slower than a normal import at all, but you can increase the block size if you want to speed things up a little more at line 37 of config.advanced.php e.g.

  $config_slowImportBlock = 1000;

In general, I wouldn't suggest using a value that takes any longer than approximately half your execution time limit per block, just to be on the safe side!

Cheers,
David.
--
PriceTapestry.com

Submitted by shams995 on Thu, 2013-05-16 14:10

Hi David,

Thanks for the info. Would it work out (specially the slow import block) if the whole fetch import process is automated?

Thanks
shams

Submitted by support on Thu, 2013-05-16 15:09

Hi shams,

There is a scriptable version of the Slow Import Tool which can be scheduled as a CRON job; however this does require that an individual CRON process is not time limited itself and because of the way slow import works, it has to be requested as if it is a web page (e.g. over HTTP) for example:

/usr/bin/wget --max-redirect=9999 -O /dev/null http://www.yoursite.com/scripts/import_slow.php?filemname=@ALL&password=[password]

- wget defaults to not follow redirects so --max-redirect is required

- where [password] is your /admin/ password as configured in config.advanced.php)

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by shams995 on Thu, 2013-05-16 15:48

Thanks mate, i'll have a look at it.

cheers
shams