You are here:  » Using Curl to download feeds

Support Forum



Using Curl to download feeds

Submitted by paddyman on Wed, 2010-12-22 08:02 in

Hi David,

Have moved a site to new shared server space which doesn't run sh commands in cron. Have had to change my file from .sh to .php and as such wget will no longer work.

Here's what I had

wget -O "/home/...../public_html/mysite/feeds/asda.csv.gz" "{link saved}"
gzip -c -d -S "" /home/...../public_html/mysite/feeds/asda.csv.gz > /home/...../public_html/mysite/feeds/asda.csv
rm /home/...../public_html/mysite/feeds/asda.csv.gz

Trying to download using curl, so changed wget to curl above but doesn't download the feed.

Any idea on how to get this working ?

Cheers

Adrian

Submitted by support on Wed, 2010-12-22 10:34

Hi Adrian,

I've just been looking at a CURL based PHP automation script as it happens. Have a go with something like this:

<?php
  set_time_limit
(0);
  require(
"../includes/common.php");
  
$admin_checkPassword FALSE;
  require(
"../includes/admin.php");
  require(
"../includes/filter.php");
  require(
"../includes/MagicParser.php");
  
$feeds_dir "/home/...../public_html/mysite/feeds/";
  
chdir($feeds_dir);
  
$urls = array();
  
$urls["asda.csv.gz"] = "{link saved}";
  
$urls["snowleader.csv"] = "";
  foreach(
$urls as $filename => $url)
  {
    
$fp fopen($feed_dir.$filename,"w");
    if (!
$fp) die("Could not create ".$feed_dir.$filename." - check permissions");
    
$ch curl_init($url);
    
curl_setopt($chCURLOPT_FILE$fp);
    
curl_exec($ch);
    
curl_close($ch);
    
fclose($fp);
    
// gzip decompress
    
$cmd "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";
    
// to import as well as fetch uncomment next line
    // $filename = str_replace(".gz","",$filename);
    // admin_import($filename,0);
  
}
?>

Don't forget to replace {link saved] with your ASDA feed URL in the above...

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by paddyman on Fri, 2010-12-24 08:48

Hi David,

Many thanks for your help with this.

Have tried the above script but getting the following error...

Call to undefined function shellescapearg() in /home/public_html/mysite/admin/fetch.php on line 23

Line 23 is $cmd = "/usr/bin/gzip -d \"".shellescapearg($filename)."\"";

Any ideas ?

Thanks

Adrian

Submitted by support on Fri, 2010-12-24 09:57

Sorry, Adrian - I got the function name the wrong way round - it should be:

$cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";

Cheers,
David.
--
PriceTapestry.com

Submitted by sbedigital on Sat, 2010-12-25 15:08

I have tried to use the script above, it downloads the feeds from AWIN fine but does not decompress the files nor does it show up any errors. Do you know what could be the probplem?

Submitted by support on Tue, 2010-12-28 13:01

Hi,

Sorry about that - I just realised that there is a line missing after the $cmd variable is created, to actually execute it! Where you have the following code in your script:

    $cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";

...REPLACE that with...

    $cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";
    exec($cmd);

Cheers,
David.
--
PriceTapestry.com

Submitted by paddyman on Wed, 2010-12-29 11:07

Hi David

Hope you enjoyed the break.

Tried the above script but can't run escapeshellarg on my new shared server space, host has blocked this function for security reasons.

Do you know if there is anything else that can be used instead ?

Thanks

Adrian

Submitted by support on Wed, 2010-12-29 11:14

Hi Adrian,

Could be a challenge if shell functions (I presume including exec() etc.) are disables for security reasons. ZLib might be an option - to find out if it's enabled on your installation create a phpinfo() script on your new server:

pi.php

<?php
  phpinfo
();
?>

...browse to the script and look through the output for a ZLib section...

Cheers,
David.
--
PriceTapestry.com

Submitted by paddyman on Thu, 2010-12-30 07:38

Hi David,

Thanks for sticking with this.

Zlib is enabled as follows

ZLib Support enabled
Stream Wrapper support compress.zlib://
Stream Filter support zlib.inflate, zlib.deflate
Compiled Version 1.2.3
Linked Version 1.2.3

Directive Local Value Master Value
zlib.output_compression Off Off
zlib.output_compression_level -1 -1
zlib.output_handler no value no value

Had a quick search on google on how to implement, but totally lost !!

Cheers

Adrian

Submitted by support on Fri, 2010-12-31 11:18

Hi Adrian,

Ok this is worth a go! At the position of the following comment in the original script:

    // gzip decompress

...have a go with:

    $fz = gzopen($feed_dir.$filename,"r");
    $fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
    while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
    fclose($fp);
    gzclose($fz);

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by paddyman on Tue, 2011-01-04 13:44

Hi David,

Trying to run the above as a cron job and getting the following error.

/home/public_html/..../..../fetch.php: line 1: ?php: No such file or directory
/home/public_html/..../..../fetch.php: line 2: syntax error near unexpected token `0'
/home/public_html/..../..../fetch.php: line 2: ` set_time_limit(0);'

Any ideas ?

Thanks

Adrian

Submitted by support on Tue, 2011-01-04 14:11

Hi Adrian,

Try adding the following as the first line, with the opening PHP tag moved down to the line 2:

#!/usr/bin/php

This tells the cron process which binary to use to execute the script...

If that doesn't work, instead try the following as your cron command:

/usr/bin/php /home/public_html/..../..../fetch.php

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Wed, 2011-01-05 01:52

Hi,

Im trying this aswell, using the script + modifications above I'm getting this error.

Warning: exec() has been disabled for security reasons in ...

Is there a work around for this? I'm not using it as a cron right now, but that shouldn't matter I guess.

Vince

Submitted by support on Wed, 2011-01-05 10:40

Hi Vince,

That sounds like you're still using the exec() method to uncompress the downloaded feeds, so if exec() is disabled, have a go with the zlib version above from this post which should do the trick...

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Wed, 2011-01-05 11:19

Hi David,

You are right, I made a mistake. I fixed it now. It does upload a part of the file to my FTP now. But just a part, it breaks the XML feed after about 200 lines.

(It isn't the feed bugging)

edit: tried it with another feed, same problem.) or is this because im not using it as a cronjob? Don't think so.

Vince

Submitted by Vince on Wed, 2011-01-05 15:18

Here is the code I have right now.

#!/usr/bin/php
<?php
  set_time_limit(0);
  require("../includes/common.php");
  $admin_checkPassword = FALSE;
  require("../includes/admin.php");
  require("../includes/filter.php");
  require("../includes/MagicParser.php");
  $feeds_dir = "../feeds/";
  chdir($feeds_dir);
  $urls = array();
  $urls["test.xml"] = "{link saved}";
  foreach($urls as $filename => $url)
  {
    $fp = fopen($feed_dir.$filename,"w");
    if (!$fp) die("Could not create ".$feed_dir.$filename." - check permissions");
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_exec($ch);
    curl_close($ch);
    fclose($fp);
    // gzip decompress
    $fz = gzopen($feed_dir.$filename,"r");
    $fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
    while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
    fclose($fp);
    gzclose($fz);
    // to import as well as fetch uncomment next line
    // $filename = str_replace(".gz","",$filename);
    // admin_import($filename,0);
  }
?>

Submitted by support on Wed, 2011-01-05 16:39

Hi Vince,

When you say the feed is being broken after about 200 lines, is it just cut off before completion; or does it appear to be corrupted?

The script doesn't delete the .gz version, so one thing I would suggest trying would be to download the .gz version from your /feeds/ folder to your local computer, and then attempt to unzip using WinZip which support Gzip compression...

If you're not sure what stage it's breaking if you want to let me know the location of the installation (i'll remove the URL before posting your reply) I'll download the files to my test server and take a look...

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Wed, 2011-01-05 17:36

Hi,

It just ends it like this:

<field name="brand">Bosch</field>
<field name="

The file can be found here: {link saved}

Vince

Submitted by support on Wed, 2011-01-05 20:30

Hi Vince,

I tried to access agradi.xml.gz from the same location but it's not found - are you manually deleting the .gz version after running the script? If so, could you let me know when it's back online so that I can run the same code against it...

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Wed, 2011-01-05 21:14

Hi David,

No, i'm just running the script I posted above.

Greetings,
Vince

Submitted by paddyman on Wed, 2011-01-05 22:08

Hi David,

Last mod worked great so almost there with getting all feeds downloaded.

Can you advise on what to add below to remove the compressed gz files after extraction.

$fz = gzopen($feed_dir.$filename,"r");
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
fclose($fp);
gzclose($fz);

Also, my site has multiple installations of price tapestry across separate folders. All of these folders access feeds from one main feeds folder. Would like to continue to use this script to download the feeds and have separate scripts to import them into each database.

Any ideas ? Hope this makes sense !!

Thanks

Adrian

Submitted by Vince on Wed, 2011-01-05 22:23

Hi David,

I have been playing with the script for a while, i made some progress i think. Maybe you can help me.

I changed this line:

$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");

to

$fp = fopen($feed_dir.str_replace(".xml","",$filename),"w");

This results in 2 files being uploaded. The XML, wich is cut off by around line 200. And a file without extension. But this file is correct, it's complete but just doesn't have an extension.

Maybe you can help me with a solution now?

I only want one correct file in XML extension.

I hope you understand it. ;)

Cheers,
Vince

Submitted by Vince on Wed, 2011-01-05 22:36

Sorry for posting again, haha. You can remove this post.

But changing it to this:
$fp = fopen($feed_dir.str_replace(".xml","",$filename),"w");

Results in this:
- agradi.xml (CUT OFF, INCOMPLETE! NOT NEEDED!)
- agradi (CORRECT, I NEED THIS ONE, BUT WITH EXTENSION.)

Hope thats easier to understand

Sorry for spamming, David.

Vince.

Submitted by support on Thu, 2011-01-06 10:06

Hi Vince,

I think I see what's happening now - your feed is .gz compressed, but the .gz extension (which the script is expecting) is missing from the download filename in the $urls array, so instead of:

  $urls["test.xml"] = "{link saved}";

use:

  $urls["test.xml.gz"] = "{link saved}";

(but with your actual name instead of test and URL in place of {link saved}

Then, revert the line that you have been experimenting with changes to back to:

    $fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");

...then I think you should be in business!

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 12:43

Hi David,

This results in:

agradi.xml
agradi.xml.gz

The XML file is complete, but not the .xml.gz. The problem is I keep getting 2 files for some reason. I tried 2 feeds and the same thing happens, I get 4 files then. 2x XML, 2x XML.GZ.

Vince

Submitted by support on Thu, 2011-01-06 12:51

Hi Vince,

Sounds like that's nearly there actually - all that's needed now is to delete the .gz file after the unzip! In your script, where you have:

  gzclose($fz);

...REPLACE that with:

  gzclose($fz);
  unlink($filename);

That should then leave you with just the .xml files, all the .gz (compressed) versions having been removed by the script!

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 13:37

Hi,

Thanks David, this does the job! :)

Im trying to make the feeds auto import now. This is working, but I always have a strange problem when importing.

The website is often down just before the actual import, when the products start importing it's online again and I can see the amount of products update on the merchants page.

It looks like removing the old products takes alot of load, very strange.

Do you know what it is?

Vince.

Submitted by Vince on Thu, 2011-01-06 13:43

Hi,

I checked the import function, I think it is because I have many filters added. (?)

Is there a solution to take less load during this process.

Vince.

Submitted by support on Thu, 2011-01-06 13:56

Hello Vince,

For a large site, importing by individual merchant can create a small period of downtime because the DELETE FROM query has to lock the table whilst product records for the merchant being imported are removed.

For a large site, there is a great modification described in this thread that imports to a temporary table and then switches it to the live table after import. This results in almost no downtime even when important 10s of thousands of products.

With the above modification in place, you could schedule "import.php @ALL" to run after your fetch script has completed - don't forget that you can combine multiple commands in a single CRON process so that they happen sequentially and you can be sure that the first has completed before the second begins, e.g.

/usr/bin/php /path/to/scripts/fetch.php;/usr/bin/php /path/to/scripts/import.php @ALL

Of course this depends exactly how you are triggering your fetch.php script at the moment - if you're not sure let me know and I'll show you how to invoke import.php @ALL in the same way...

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 14:23

Hi David,

I have added the modification to my files. This is how my fetch.php looks like at the moment. Can you help me including import.php @ all in it and how to setup a double cronjob. (never did two in one.)

I want the files to be removed after everything is done, i added that to the file below.

<?php
  set_time_limit(0);
  require("../includes/common.php");
  $admin_checkPassword = FALSE;
  require("../includes/admin.php");
  require("../includes/filter.php");
  require("../includes/MagicParser.php");
  $feeds_dir = "../feeds/";
  chdir($feeds_dir);
  $urls = array();
  $urls["kijkshop.xml.gz"] = "{removed}";
  $urls["bcc.xml.gz"] = "{removed}";
  foreach($urls as $filename => $url)
  {
    $fp = fopen($feed_dir.$filename,"w");
    if (!$fp) die("Could not create ".$feed_dir.$filename." - check permissions");
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_exec($ch);
    curl_close($ch);
    fclose($fp);
    // gzip decompress
    $fz = gzopen($feed_dir.$filename,"r");
    $fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
    while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
    fclose($fp);
gzclose($fz);
unlink($filename);
    // to import as well as fetch uncomment next line
    $filename = str_replace(".gz","",$filename);
    admin_import($filename,0);
// delete after import
unlink($filename);
  }
?>

Cheers,
Vince

Submitted by support on Thu, 2011-01-06 14:30

Hi Vince,

Can you let me know how you currently have the above script set-up as a CRON job (e.g. what command you have entered), then I can work out from that how to double them up...

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 14:32

Hi David,

I didn't set up a cronjob for it yet, but I think this is what you want:

/usr/local/bin/php -q -f /home/path/to/scripts/fetch.php

Cheers,
Vince.

Submitted by support on Thu, 2011-01-06 14:38

Hi Vince,

Yes - that's it. With the zero down-time mod in place, here's what I'd add as the cron job - all on one line:

cd /home/path/to/scripts/;/usr/local/bin/php -q -f fetch.php;/usr/local/bin/php -q -f import.php @ALL

Using a single cd command to change to your scripts folder (don't forget to replace /path/to/ with the actual path on your server) means that in case PHP isn't configured to internally change directory to the same folder as the script then it will still work (as import.php needs to know where ../feeds is relative to itself)

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 14:50

Hi David,

I added this in my DA:

42 (minute) * * * * /home/path/to/scripts/;/usr/local/bin/php -q -f fetch.php;/usr/local/bin/php -q -f import.php @ALL

Tried it with cd in front of it too, it doesn't work for some reason. Is there a way to test it without cron, maybe something else is wrong.

Cheers,
Vince.

Submitted by support on Thu, 2011-01-06 14:55

Hi Vince,

First thing I'd do is confirm that CRON is working for a single command, e.g.

42 (minute) * * * * /usr/local/bin/php -q -f /home/path/to/scripts/fetch.php

At the prescribed time that should run your fetch script (after it's had time to run you can check by the Modified time in /admin/)

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 15:50

Hi David,

Sorry, that took a while. :)

This is the problem, i couldn't find my cronjob logs. I found them now.

Warning: require(../includes/common.php): failed to open stream: No such file or directory in /home/-/domains/-/public_html/scripts/fetch.php on line 3

Warning: require(../includes/common.php): failed to open stream: No such file or directory in /home/-/domains/-/public_html/scripts/fetch.php on line 3

Fatal error: require(): Failed opening required '../includes/common.php' (include_path='.:/usr/local/lib/php') in /home/-/domains/-/public_html/scripts/fetch.php on line 3

Pretty weird, eh.

Cheers,
Vince.

Submitted by support on Thu, 2011-01-06 16:01

Hi Vince,

Ah - OK, try this as the first line of your fetch.sh script (after the opening PHP tag):

  chdir("/home/-/domains/-/public_html/scripts/");

(replace the - with the appropriate values on your server)

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 16:17

Hi David,

Yes - That solved it. I runned the following cronjob:

cd /home/path/to/scripts/;/usr/local/bin/php -q -f fetch.php;/usr/local/bin/php -q -f import.php @ALL

It deleted all my products, all of them.

Would it help if I send you my fetch.php, import.php and includes/admin.php? I really want to fix this. :)

Also, import.php didn't create any tables in the database. So it doesn't seem to work very well.

Cheers,
Vince.p

Submitted by support on Thu, 2011-01-06 16:21

Hi Vince,

Sounds like you're very close - sure, email me all the files over and I'll check the changes for you...

Cheers,
David.
--
PriceTapestry.com

Submitted by Vince on Thu, 2011-01-06 16:55

Hi David,

I emailed the files to you. :)

Thanks again, for helping me out.

Cheers,
Vince.

Submitted by Vince on Thu, 2011-01-06 18:10

Hi David,

Okay the last version of the fetch.php file you send me is working perfectly. The feeds are being uploaded.

When i run the cronjob, all it does, is the fetch.php part. The cronjob uploads the feeds to my ftp server.

Import.php doesn't seem to do it's job. The tables aren't being created, so it looks like it doesn't even try to import.

I think we are almost there. :)

Cheers,
Vince.

Submitted by support on Thu, 2011-01-06 18:26

Hi Vince,

Have followed up by email...

Cheers,
David.
--
PriceTapestry.com