You are here:  » Multiple Automation Tools in one install


Multiple Automation Tools in one install

Submitted by Convergence on Thu, 2014-05-15 21:35 in

Greetings,

BIG thinking cap on this one, David :)

Due to some networks taking longer time to fetch their feeds, is it possible / plausible to create multiple Automation Tools with separate fetch.php for each network.

This would allow us to run five or six network "fetches" within a single install - thereby speeding up the fetch process.

Keeping in mind there will be one common /feeds folder :)

Thoughts?

Submitted by support on Fri, 2014-05-16 08:59

Hi Convergence,

As you know, fetch.php accepts a parameter, which can be either the specific feed to fetch (by way of its filename), or @ALL to fetch all feeds. So how about simply extending the SQL used when is not @ALL, to also look at the URL of the job - so you will then be able to specify the host name from which affiliate network A serves their feeds to run jobs for that network only! To try this, edit scripts.php and look for the following code at line 38:

$sql = "SELECT * FROM `".$config_databaseTablePrefix."jobs` WHERE filename='".database_safe($filename)."'";

...and REPLACE with:

$sql = "SELECT * FROM `".$config_databaseTablePrefix."jobs` WHERE filename='".database_safe($filename)."' OR url LIKE '%".database_safe($filename)."%'";

With that in place, you will then be able to invoke, for example using:

/usr/bin/php fetch.php example.com

- run jobs only where "example.com" appears in the URL

/usr/bin/php fetch.php example.org

- run jobs only where "example.org" appears in the URL

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Fri, 2014-05-16 13:34

Interesting.

Hopes was to be able to run all network simultaneously - thereby speeding up the total fetch process by having multiple download streams at one time instead of feed after feed until completed...

Submitted by support on Fri, 2014-05-16 14:17

Hi Convergence,

Ah - that was the intention, but would have required the scheduling of simultaneous fetch.php commands - but of course, you would then need to know when they had all completed, so let's look at ways to combine into a single process...

First of all however, I just realised an additional modification that would need to be made to avoid the temporary file used during the download from being overwritten.

In includes/automation.php look for the following code at line 42:

  $tmp = $config_feedDirectory."tmp";

...and REPLACE with:

  $tmp = $config_feedDirectory."tmp".$job_id;

This will mean that a unique temporary filename is used for each job, so no problem running multiple instances of fetch.php simultaneously.

I've just been having a play with PHP's process control functions, and it looks like creating a single script, for which you can wait for completion to simultaneously run multiple instances of fetch.php should be straight forward; and further, we can automatically derive the commands by reading the `pt_feeds` table and compiling a unique list of host names from the URL field.

Create the new file, scripts/fetchsimul.php as follows:

<?php
  set_time_limit
(0);
  require(
"../includes/common.php");
  
$commands = array();
  
$sql "SELECT * FROM `".$config_databaseTablePrefix."jobs`";
  if (
database_querySelect($sql,$jobs))
  {
    foreach(
$jobs as $job)
    {
      
$parts parse_url($job["url"]);
      
$commands[$parts["host"]]["cmd"] = "/usr/bin/php fetch.php ".$parts["host"];
    }
  }
  foreach(
$commands as $host => $command)
  {
    
$commands[$host]["process"] = proc_open($command["cmd"],array(fopen("/dev/null","w")),$pipes);
  }
  do
  {
    
sleep(5);
    
$running 0;
    foreach(
$commands as $command)
    {
      
$status proc_get_status($command["process"]);
      if (
$status["running"]) $running++;
    }
  }
  while(
$running);
  exit();
?>

Then in your CRON job, simply replace fetch.php with fetchsimul.php and stand back!

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Fri, 2014-05-16 14:56

Thanks, David!

I have to wrap my head around this and make sure I'm editing all the correct files and only the files that need editing.

Also, still don't see how to setup multiple "Automation Tools" for each network - would that just be renaming the Automation Tools to something like Automation Tools/LinkShare, Automation Tools/CJ, etc.?

Or have I not had enough coffee, yet?

Submitted by support on Fri, 2014-05-16 15:23

Hi Convergence,

No need to create multiple instances of Automation Tool at all - with the above in place; the new fetchsimul.php script will simultaneously fetch from each different network.

What it does is read the list of jobs and compiles a list of unique host names (e.g. linkshare.com, cj.com) and then spawns simultaneous instances of fetch.php for each unique host name (network). The script itself won't exit until all child processes have completed.

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Sun, 2014-05-18 20:30

This is the CRON email sent:

--2014-05-18 14:10:01-- http://www./scripts/fetchsimul.php?password=$&filename=@ALL
Resolving www....
Connecting to www.||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106170 (104K) [text/html]
Saving to: “/dev/null”

0K .......... .......... .......... .......... .......... 48% 431M 0s
50K .......... .......... .......... .......... .......... 96% 1.14G 0s
100K ... 100% 7022G=0s

2014-05-18 14:10:21 (653 MB/s) - “/dev/null” saved [106170/106170]

Submitted by support on Mon, 2014-05-19 07:32

Hi Convergence,

Hmmm - 20 seconds seems a little quick - do the Modified (or Job run) date/times indicate nothing was executed using the fetchsimul.php method?

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Mon, 2014-05-19 13:29

Correction: Nothing was "FETCHED", good sir...

Submitted by support on Mon, 2014-05-19 18:05

Hi Dennis,

The path to php may be incorrect - in your new file fetchsimul.php where you have this code:

      $commands[$parts["host"]]["cmd"] = "/usr/bin/php fetch.php ".$parts["host"];

...REPLACE with:

      $commands[$parts["host"]]["cmd"] = "/usr/local/bin/php fetch.php ".$parts["host"];

If still no joy, drop me an email and I'll check out this, and the import_slow.php issue directly on your server for you, no problem...

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Sun, 2014-05-25 23:16

Hi David,

Just a quick note to express our appreciation for your extra effort in assisting us.

We're back and running full steam, again!