Hi David,
Have moved a site to new shared server space which doesn't run sh commands in cron. Have had to change my file from .sh to .php and as such wget will no longer work.
Here's what I had
wget -O "/home/...../public_html/mysite/feeds/asda.csv.gz" "{link saved}"
gzip -c -d -S "" /home/...../public_html/mysite/feeds/asda.csv.gz > /home/...../public_html/mysite/feeds/asda.csv
rm /home/...../public_html/mysite/feeds/asda.csv.gz
Trying to download using curl, so changed wget to curl above but doesn't download the feed.
Any idea on how to get this working ?
Cheers
Adrian
Hi David,
Many thanks for your help with this.
Have tried the above script but getting the following error...
Call to undefined function shellescapearg() in /home/public_html/mysite/admin/fetch.php on line 23
Line 23 is $cmd = "/usr/bin/gzip -d \"".shellescapearg($filename)."\"";
Any ideas ?
Thanks
Adrian
Sorry, Adrian - I got the function name the wrong way round - it should be:
$cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";
Cheers,
David.
--
PriceTapestry.com
I have tried to use the script above, it downloads the feeds from AWIN fine but does not decompress the files nor does it show up any errors. Do you know what could be the probplem?
Hi,
Sorry about that - I just realised that there is a line missing after the $cmd variable is created, to actually execute it! Where you have the following code in your script:
$cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";
...REPLACE that with...
$cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";
exec($cmd);
Cheers,
David.
--
PriceTapestry.com
Hi David
Hope you enjoyed the break.
Tried the above script but can't run escapeshellarg on my new shared server space, host has blocked this function for security reasons.
Do you know if there is anything else that can be used instead ?
Thanks
Adrian
Hi Adrian,
Could be a challenge if shell functions (I presume including exec() etc.) are disables for security reasons. ZLib might be an option - to find out if it's enabled on your installation create a phpinfo() script on your new server:
pi.php
<?php
phpinfo();
?>
...browse to the script and look through the output for a ZLib section...
Cheers,
David.
--
PriceTapestry.com
Hi David,
Thanks for sticking with this.
Zlib is enabled as follows
ZLib Support enabled
Stream Wrapper support compress.zlib://
Stream Filter support zlib.inflate, zlib.deflate
Compiled Version 1.2.3
Linked Version 1.2.3
Directive Local Value Master Value
zlib.output_compression Off Off
zlib.output_compression_level -1 -1
zlib.output_handler no value no value
Had a quick search on google on how to implement, but totally lost !!
Cheers
Adrian
Hi Adrian,
Ok this is worth a go! At the position of the following comment in the original script:
// gzip decompress
...have a go with:
$fz = gzopen($feed_dir.$filename,"r");
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
fclose($fp);
gzclose($fz);
Hope this helps!
Cheers,
David.
--
PriceTapestry.com
Hi David,
Trying to run the above as a cron job and getting the following error.
/home/public_html/..../..../fetch.php: line 1: ?php: No such file or directory
/home/public_html/..../..../fetch.php: line 2: syntax error near unexpected token `0'
/home/public_html/..../..../fetch.php: line 2: ` set_time_limit(0);'
Any ideas ?
Thanks
Adrian
Hi Adrian,
Try adding the following as the first line, with the opening PHP tag moved down to the line 2:
#!/usr/bin/php
This tells the cron process which binary to use to execute the script...
If that doesn't work, instead try the following as your cron command:
/usr/bin/php /home/public_html/..../..../fetch.php
Cheers,
David.
--
PriceTapestry.com
Hi,
Im trying this aswell, using the script + modifications above I'm getting this error.
Warning: exec() has been disabled for security reasons in ...
Is there a work around for this? I'm not using it as a cron right now, but that shouldn't matter I guess.
Vince
Hi Vince,
That sounds like you're still using the exec() method to uncompress the downloaded feeds, so if exec() is disabled, have a go with the zlib version above from this post which should do the trick...
Cheers,
David.
--
PriceTapestry.com
Hi David,
You are right, I made a mistake. I fixed it now. It does upload a part of the file to my FTP now. But just a part, it breaks the XML feed after about 200 lines.
(It isn't the feed bugging)
edit: tried it with another feed, same problem.) or is this because im not using it as a cronjob? Don't think so.
Vince
Here is the code I have right now.
#!/usr/bin/php
<?php
set_time_limit(0);
require("../includes/common.php");
$admin_checkPassword = FALSE;
require("../includes/admin.php");
require("../includes/filter.php");
require("../includes/MagicParser.php");
$feeds_dir = "../feeds/";
chdir($feeds_dir);
$urls = array();
$urls["test.xml"] = "{link saved}";
foreach($urls as $filename => $url)
{
$fp = fopen($feed_dir.$filename,"w");
if (!$fp) die("Could not create ".$feed_dir.$filename." - check permissions");
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
curl_close($ch);
fclose($fp);
// gzip decompress
$fz = gzopen($feed_dir.$filename,"r");
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
fclose($fp);
gzclose($fz);
// to import as well as fetch uncomment next line
// $filename = str_replace(".gz","",$filename);
// admin_import($filename,0);
}
?>
Hi Vince,
When you say the feed is being broken after about 200 lines, is it just cut off before completion; or does it appear to be corrupted?
The script doesn't delete the .gz version, so one thing I would suggest trying would be to download the .gz version from your /feeds/ folder to your local computer, and then attempt to unzip using WinZip which support Gzip compression...
If you're not sure what stage it's breaking if you want to let me know the location of the installation (i'll remove the URL before posting your reply) I'll download the files to my test server and take a look...
Cheers,
David.
--
PriceTapestry.com
Hi,
It just ends it like this:
<field name="brand">Bosch</field>
<field name="
The file can be found here: {link saved}
Vince
Hi Vince,
I tried to access agradi.xml.gz from the same location but it's not found - are you manually deleting the .gz version after running the script? If so, could you let me know when it's back online so that I can run the same code against it...
Cheers,
David.
--
PriceTapestry.com
Hi David,
No, i'm just running the script I posted above.
Greetings,
Vince
Hi David,
Last mod worked great so almost there with getting all feeds downloaded.
Can you advise on what to add below to remove the compressed gz files after extraction.
$fz = gzopen($feed_dir.$filename,"r");
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
fclose($fp);
gzclose($fz);
Also, my site has multiple installations of price tapestry across separate folders. All of these folders access feeds from one main feeds folder. Would like to continue to use this script to download the feeds and have separate scripts to import them into each database.
Any ideas ? Hope this makes sense !!
Thanks
Adrian
Hi David,
I have been playing with the script for a while, i made some progress i think. Maybe you can help me.
I changed this line:
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
to
$fp = fopen($feed_dir.str_replace(".xml","",$filename),"w");
This results in 2 files being uploaded. The XML, wich is cut off by around line 200. And a file without extension. But this file is correct, it's complete but just doesn't have an extension.
Maybe you can help me with a solution now?
I only want one correct file in XML extension.
I hope you understand it. ;)
Cheers,
Vince
Sorry for posting again, haha. You can remove this post.
But changing it to this:
$fp = fopen($feed_dir.str_replace(".xml","",$filename),"w");
Results in this:
- agradi.xml (CUT OFF, INCOMPLETE! NOT NEEDED!)
- agradi (CORRECT, I NEED THIS ONE, BUT WITH EXTENSION.)
Hope thats easier to understand
Sorry for spamming, David.
Vince.
Hi Vince,
I think I see what's happening now - your feed is .gz compressed, but the .gz extension (which the script is expecting) is missing from the download filename in the $urls array, so instead of:
$urls["test.xml"] = "{link saved}";
use:
$urls["test.xml.gz"] = "{link saved}";
(but with your actual name instead of test and URL in place of {link saved}
Then, revert the line that you have been experimenting with changes to back to:
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
...then I think you should be in business!
Cheers,
David.
--
PriceTapestry.com
Hi David,
This results in:
agradi.xml
agradi.xml.gz
The XML file is complete, but not the .xml.gz. The problem is I keep getting 2 files for some reason. I tried 2 feeds and the same thing happens, I get 4 files then. 2x XML, 2x XML.GZ.
Vince
Hi Vince,
Sounds like that's nearly there actually - all that's needed now is to delete the .gz file after the unzip! In your script, where you have:
gzclose($fz);
...REPLACE that with:
gzclose($fz);
unlink($filename);
That should then leave you with just the .xml files, all the .gz (compressed) versions having been removed by the script!
Cheers,
David.
--
PriceTapestry.com
Hi,
Thanks David, this does the job! :)
Im trying to make the feeds auto import now. This is working, but I always have a strange problem when importing.
The website is often down just before the actual import, when the products start importing it's online again and I can see the amount of products update on the merchants page.
It looks like removing the old products takes alot of load, very strange.
Do you know what it is?
Vince.
Hi,
I checked the import function, I think it is because I have many filters added. (?)
Is there a solution to take less load during this process.
Vince.
Hello Vince,
For a large site, importing by individual merchant can create a small period of downtime because the DELETE FROM query has to lock the table whilst product records for the merchant being imported are removed.
For a large site, there is a great modification described in this thread that imports to a temporary table and then switches it to the live table after import. This results in almost no downtime even when important 10s of thousands of products.
With the above modification in place, you could schedule "import.php @ALL" to run after your fetch script has completed - don't forget that you can combine multiple commands in a single CRON process so that they happen sequentially and you can be sure that the first has completed before the second begins, e.g.
/usr/bin/php /path/to/scripts/fetch.php;/usr/bin/php /path/to/scripts/import.php @ALL
Of course this depends exactly how you are triggering your fetch.php script at the moment - if you're not sure let me know and I'll show you how to invoke import.php @ALL in the same way...
Cheers,
David.
--
PriceTapestry.com
Hi David,
I have added the modification to my files. This is how my fetch.php looks like at the moment. Can you help me including import.php @ all in it and how to setup a double cronjob. (never did two in one.)
I want the files to be removed after everything is done, i added that to the file below.
<?php
set_time_limit(0);
require("../includes/common.php");
$admin_checkPassword = FALSE;
require("../includes/admin.php");
require("../includes/filter.php");
require("../includes/MagicParser.php");
$feeds_dir = "../feeds/";
chdir($feeds_dir);
$urls = array();
$urls["kijkshop.xml.gz"] = "{removed}";
$urls["bcc.xml.gz"] = "{removed}";
foreach($urls as $filename => $url)
{
$fp = fopen($feed_dir.$filename,"w");
if (!$fp) die("Could not create ".$feed_dir.$filename." - check permissions");
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
curl_close($ch);
fclose($fp);
// gzip decompress
$fz = gzopen($feed_dir.$filename,"r");
$fp = fopen($feed_dir.str_replace(".gz","",$filename),"w");
while(!gzeof($fz)) fwrite($fp,gzread($fz,1024));
fclose($fp);
gzclose($fz);
unlink($filename);
// to import as well as fetch uncomment next line
$filename = str_replace(".gz","",$filename);
admin_import($filename,0);
// delete after import
unlink($filename);
}
?>
Cheers,
Vince
Hi Vince,
Can you let me know how you currently have the above script set-up as a CRON job (e.g. what command you have entered), then I can work out from that how to double them up...
Cheers,
David.
--
PriceTapestry.com
Hi David,
I didn't set up a cronjob for it yet, but I think this is what you want:
/usr/local/bin/php -q -f /home/path/to/scripts/fetch.php
Cheers,
Vince.
Hi Vince,
Yes - that's it. With the zero down-time mod in place, here's what I'd add as the cron job - all on one line:
cd /home/path/to/scripts/;/usr/local/bin/php -q -f fetch.php;/usr/local/bin/php -q -f import.php @ALL
Using a single cd command to change to your scripts folder (don't forget to replace /path/to/ with the actual path on your server) means that in case PHP isn't configured to internally change directory to the same folder as the script then it will still work (as import.php needs to know where ../feeds is relative to itself)
Cheers,
David.
--
PriceTapestry.com
Hi David,
I added this in my DA:
42 (minute) * * * * /home/path/to/scripts/;/usr/local/bin/php -q -f fetch.php;/usr/local/bin/php -q -f import.php @ALL
Tried it with cd in front of it too, it doesn't work for some reason. Is there a way to test it without cron, maybe something else is wrong.
Cheers,
Vince.
Hi Vince,
First thing I'd do is confirm that CRON is working for a single command, e.g.
42 (minute) * * * * /usr/local/bin/php -q -f /home/path/to/scripts/fetch.php
At the prescribed time that should run your fetch script (after it's had time to run you can check by the Modified time in /admin/)
Cheers,
David.
--
PriceTapestry.com
Hi David,
Sorry, that took a while. :)
This is the problem, i couldn't find my cronjob logs. I found them now.
Warning: require(../includes/common.php): failed to open stream: No such file or directory in /home/-/domains/-/public_html/scripts/fetch.php on line 3
Warning: require(../includes/common.php): failed to open stream: No such file or directory in /home/-/domains/-/public_html/scripts/fetch.php on line 3
Fatal error: require(): Failed opening required '../includes/common.php' (include_path='.:/usr/local/lib/php') in /home/-/domains/-/public_html/scripts/fetch.php on line 3
Pretty weird, eh.
Cheers,
Vince.
Hi Vince,
Ah - OK, try this as the first line of your fetch.sh script (after the opening PHP tag):
chdir("/home/-/domains/-/public_html/scripts/");
(replace the - with the appropriate values on your server)
Cheers,
David.
--
PriceTapestry.com
Hi David,
Yes - That solved it. I runned the following cronjob:
cd /home/path/to/scripts/;/usr/local/bin/php -q -f fetch.php;/usr/local/bin/php -q -f import.php @ALL
It deleted all my products, all of them.
Would it help if I send you my fetch.php, import.php and includes/admin.php? I really want to fix this. :)
Also, import.php didn't create any tables in the database. So it doesn't seem to work very well.
Cheers,
Vince.p
Hi Vince,
Sounds like you're very close - sure, email me all the files over and I'll check the changes for you...
Cheers,
David.
--
PriceTapestry.com
Hi David,
I emailed the files to you. :)
Thanks again, for helping me out.
Cheers,
Vince.
Hi David,
Okay the last version of the fetch.php file you send me is working perfectly. The feeds are being uploaded.
When i run the cronjob, all it does, is the fetch.php part. The cronjob uploads the feeds to my ftp server.
Import.php doesn't seem to do it's job. The tables aren't being created, so it looks like it doesn't even try to import.
I think we are almost there. :)
Cheers,
Vince.
Hi Vince,
Have followed up by email...
Cheers,
David.
--
PriceTapestry.com
Hi Adrian,
I've just been looking at a CURL based PHP automation script as it happens. Have a go with something like this:
<?php
set_time_limit(0);
require("../includes/common.php");
$admin_checkPassword = FALSE;
require("../includes/admin.php");
require("../includes/filter.php");
require("../includes/MagicParser.php");
$feeds_dir = "/home/...../public_html/mysite/feeds/";
chdir($feeds_dir);
$urls = array();
$urls["asda.csv.gz"] = "{link saved}";
$urls["snowleader.csv"] = "";
foreach($urls as $filename => $url)
{
$fp = fopen($feed_dir.$filename,"w");
if (!$fp) die("Could not create ".$feed_dir.$filename." - check permissions");
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
curl_close($ch);
fclose($fp);
// gzip decompress
$cmd = "/usr/bin/gzip -d \"".escapeshellarg($filename)."\"";
// to import as well as fetch uncomment next line
// $filename = str_replace(".gz","",$filename);
// admin_import($filename,0);
}
?>
Don't forget to replace {link saved] with your ASDA feed URL in the above...
Hope this helps!
Cheers,
David.
--
PriceTapestry.com