Greetings,
We currently "fetch" hundreds and hundreds of feeds, across multiple networks, 18 hours a day, from networks that do not "send" updated feeds. Only one network is left that sends you updated feeds - Commission Junction (GAN/Google Affiliate Network is shutting down).
We run separate Automation Tools on multiple installs. One processes feeds sent (by networks) to our central feed repository on the server and the others fetch feeds directly from a half-dozen networks. Problem is, this is a waste of time and resources to "fetch" feeds that have not been updated. There is no need to download an outdated feed and unzip it when it's still in the feeds directory.
Is there a way we can determine if the feed has been updated? Know this is a stretch because each network is different. We just want to eliminate unnecessary bandwidth and processing resources.
Thanks!
Hi David,
Returned one 304 from each network we have in the automation tool (out of several dozen feeds)!
Now what do we do?
Hi,
That sounds promising, but before we get too excited, it would be worth performing the corollary test to ensure that when sending an historic date, that the feed is returned as expected. In your 304test.php script, look for this line:
curl_setopt($ch, CURLOPT_TIMEVALUE, time());
...and REPLACE with;
curl_setopt($ch, CURLOPT_TIMEVALUE, 0);
This will request each feed with an "If-Modified-Since" header of the UNIX epoch (1st Jan 1970), so every test case (it will test one feed from each network as before) should return the full feed and an HTTP 200 (OK) response...
Cheers,
David.
--
PriceTapestry.com
Hi David,
There is no difference in the results - identical to the first.
Hi,
This would actually be relatively straight forward if a significant number of the networks that you are working with support the HTTP 304 (Not Modified) response! Before diving into the code changes, I've created a test script that fetches one feed per unique host with a job configured in the Automation Tool, and reports the HTTP response code returned when the request includes an If-Modified-Since header.
<?php
require("../includes/common.php");
header("Content-Type: text/plain");
$tmp = $config_feedDirectory."304testtmp";
$sql = "SELECT * FROM `".$config_databaseTablePrefix."jobs` WHERE status='OK' ORDER BY filename";
if (database_querySelect($sql,$jobs))
{
$tested = array();
foreach($jobs as $job)
{
$parts = parse_url($job["url"]);
if (in_array($parts["host"],$tested)) continue;
$tested[] = $parts["host"];
print $job["url"];
$fp = fopen($tmp,"w");
$ch = curl_init($job["url"]);
curl_setopt($ch, CURLOPT_TIMEVALUE, time());
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_FILE,$fp);
$retval = curl_exec($ch);
fclose($fp);
print ":".$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
print "\n";
}
}
unlink($tmp);
?>
Create and upload the script as admin/304test.php to an installation with a number of jobs configured in the Automation Tool, and browse to admin/304test.php. The output will be a list <url>:<HTTP response code> e.g.
http://www.example.com/feed.asp?id=1234:200
http://www.example.net/getfeed.php:304
If you see anything return 304, we're in business!
Cheers,
David.
--
PriceTapestry.com