You are here:  » How to handle large number of feeds


How to handle large number of feeds

Submitted by Convergence on Mon, 2013-06-24 22:39 in

Greetings,

Here's our problem.

One of our niches has 150 or so feeds from a half dozen or so networks.

Within that niche we have multiple installs. One for Men, one for Women, and one for Children.

For the networks that send us feeds (CJ and GAN), we have them send it to a common feed repository on the server. As we do not know when a merchant will send an updated feed, we use the Automation Tool in one of the installs to "fetch" any feeds. We do this every hour. Then a few minutes later we run another CRON to unzip those fetched feeds.

Works great, it's smooth, and when import time comes the feeds are ready for import.

Our big problem is now that GAN is closing, we are having more and more feeds going to another networks that require us to fetch them. The fetching is not a problem. The problem is the unzipping. 80 feeds at once and our dedicated server starts to groan.

We considered creating additional automation_tool.php and related files (one for each network) and staging the fetches and unzipping - but realized that there were other files in /admin/ as well as /includes/ and that there was only one table in the db for feeds.

That all being said - do you have any suggestions other than going to a new monster server?

Thanks!

Submitted by support on Tue, 2013-06-25 08:06

Hi!

The 1 second sleep every 1000 products during import technique works really well in reducing load, so what about doing something similar when fetching, say sleep for 2 seconds between each job process? If you wanted to try this, look for the following code around line 45 of fetch.php

    foreach($jobs as $job)
    {
      fetch();
    }

...and REPLACE with:

    foreach($jobs as $job)
    {
      fetch();
      sleep(2);
    }

I think you're running separate feed management / import processes but for completeness, for anyone finding this thread with similar issues but using cron.php the equivalent code is at line 71.

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Tue, 2013-06-25 18:10

Hi David,

"Fetching" is not a problem. We fetch feeds in a separate CRON. The issue is the unzipping/decompressing a mass amount of feeds at one time...

Submitted by support on Wed, 2013-06-26 08:12

Hi Convergence,

Are you currently unzipping with a single command e.g.

gzip -d *.gz

Could you perhaps post the command that you're using and let me know how it's being invoked, it should be possible to do similar to the above...

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Wed, 2013-06-26 16:46

Hi David,

For the feeds we "fetch" in the automation tool via CRON, we unzip right before we import via another CRON. The first part of the CRON is:

cd /home/xxxxx/public_html/feeds/;unzip -o \*.zip;gzip -f -d *.gz;find -name \*.zip -delete;

then the import part of the CRON

/usr/bin/wget --max-redirect=9999 -O /dev/null "http://www.path/to//scripts/import_slow.php?password=PASSWORD$&filename=@ALL"

Thanks!

Submitted by support on Thu, 2013-06-27 08:07

Hi Convergence,

This should help reduce server load considerably in the same way as the 1 second sleep during import makes a big difference. Create a new script file scripts/extract.sh (so as to differentiate from being unzip or gzip specific!) containing the following code:

cd /home/xxxxx/public_html/feeds/
for zip in *.zip
do
  unzip -o $zip
  rm $zip
  sleep 1
done
for gz in *.gz
do
  gzip -f -d $gz
  sleep 1
done

Edit line 1 to contain the correct path to /feeds/, and after uploading as your unzip CRON process, simply

/bin/bash /home/xxxxx/public_html/scripts/extract.sh

(or can be chained with any other commands separated by ";" as usual of course)

Hope this helps!

Cheers,
David.
-
PriceTapestry.com

Submitted by Convergence on Thu, 2013-06-27 22:35

Hi David,

Thanks!

Based on our original CRON above, what would the new string look like? Keeping in mind we have a delete for the .zip files following our unzip command.

Thanks, again!

Submitted by support on Fri, 2013-06-28 08:37

Hi,

After uploading the file to the /scripts/ folder, wherever you are currently using:

cd /home/xxxxx/public_html/feeds/;unzip -o \*.zip;gzip -f -d *.gz;find -name \*.zip -delete;

...REPLACE with:

/bin/bash /home/xxxxx/public_html/scripts/extract.sh;

(the extract.sh includes deleting the .zip files after unzipping!)

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Fri, 2013-06-28 18:33

Thanks, David!

Testing out on our afternoon CRON.

Submitted by Convergence on Fri, 2013-06-28 22:17

Hi David,

WOW did that ever make a difference.

Thank you, again!

Submitted by Convergence on Fri, 2015-05-08 15:29

Hi David - Happy Friday!

We have set up a test install and trying to use the extract.sh, without any success. the CRON email when we run extract.sh says:

/home/username/public_html/xxx/xx/scripts/extract.sh: line 1: cd: /home/username/public_html/feeds/
: No such file or directory
/home/username/public_html/xxx/xx/scripts/extract.sh: line 3: syntax error near unexpected token `do
'
/home/username/public_html/xxx/xx/scripts/extract.sh: line 3: `do
'

We're stumped, any suggestions/ideas?

Thanks!

Submitted by support on Fri, 2015-05-08 16:38

Hello Convergence,

I just checked the original version on my test server and it seems to work fine (Ubuntu Linux) but I notice in your log the "No such file or directory" error referring to the change direction (cd) command so it looks like the path may not be quite correct...

I notice that the error messages imply a sub-directory Price Tapestry installation (xxx/xx) so if you could double check that the path the feeds folder is to the same location as the scripts/ folder referenced in the error messages, e.g. edit line 1 as follows:

cd /home/username/public_html/xxx/xx/feeds/

...where "/home/username/public_html/xxx/xx/" is identical to that part of the path shown in the error messages and that should be all it is - if still no joy of course let me know and I'll check it out further with you...

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Fri, 2015-05-08 16:44

Hi David,

Yeah, we've checked that path time and time again, even verified that there were no issues / conflicts with .htaccess.

Weird because all our other installs have no problem.

Will play around some more, maybe we'll get lucky. If no joy, we'll take you up on your offer.

Enjoy your weekend.

Submitted by support on Fri, 2015-05-08 16:51

It might be worth quoting the path if that's all that's different from other installs, e.g.

cd "/home/username/public_html/xxx/xx/feeds/"

This would be necessary if the path contains spaces for example...

Cheers,
David.
--
PriceTapestry.com