script to download products
This has saved me a few hours work so I thought I'd pass it on:
I uploaded this in a php file to the feeds directory to import the
AW feeds directly onto my server.
It should work for all files that can be downloaded from a URL.
<?php
function download($file_source, $file_target) {
$rh = fopen($file_source, 'rb');
$wh = fopen($file_target, 'wb');
if ($rh===false || $wh===false) {
// error reading or opening file
return true;
}
while (!feof($rh)) {
if (fwrite($wh, fread($rh, 1024)) === FALSE) {
// 'Download error: Cannot write to file ('.$file_target.')';
return true;
}
}
fclose($rh);
fclose($wh);
// No error
return false;
}
//AW feeds
download('http://www.example.com/feed.xml','feed.xml');
?>
Repeat the last line, changing the URL and the file name you want it saved as, for each feed.
Then just browse to the and off you go.
For some reason it broke when there were spaces in the name of the XML file?
Now, if someone could show me how to register all these in one go I can go to bed!
I know about the register.php script but when I try to call it from a command line it says the script doesn't exist?!
tird that and got permission denied so I used CHMOD tochange the permissions to 777 and now get no such file or directory
May also be the location of the PHP CLI version sepcified on the first line of the scripts is wrong....
#!/usr/bin/phpTo test this; enter that path to PHP at the command line:
$/usr/bin/phpIf that returns no such file or directory, locate the PHP executable on your server using locate:
$locate -r /php$If this brings up something other than /usr/bin/php, edit the automation scripts to point to PHP on your server and see if that helps...
I couldnt get this to do anything - just a blank page!
<?php
function download($file_source, $file_target) {
$rh = fopen($file_source, 'rb');
$wh = fopen($file_target, 'wb');
if ($rh===false || $wh===false) {
// error reading or opening file
return true;
}
while (!feof($rh)) {
if (fwrite($wh, fread($rh, 1024)) === FALSE) {
// 'Download error: Cannot write to file ('.$file_target.')';
return true;
}
}
fclose($rh);
fclose($wh);
// No error
return false;
}
//AW feeds
download('http://www.example.com/feed.xml','feed.xml');
?>
Eddie,
That script doesn't output anything so you won't see anything; unless you mean that what it downloaded was blank. You also need to execute that script in a directory that PHP has write access to; otherwise it won't be able to create the file...
Sorry - thats what I was saying - no file was downloaded - no error - folder is 777!
The thing to do is add some echo statements to the script to print out where it's got to; or any error messages.
Oh my god - I am such a prat (must be friday)!!!!
Script downloads to the dir in which it is running - I am lloking in /feeds/
OMG - sorry
What do I need to add to get it into the "other" dir ?
You can use a relative path for the destination directory; something like "../feeds/filename.xml".
Any chance of an example (you know i am thick!)
will it overwrite feeds?? or would I need to delete them before running the script?
The script opens the destination file in mode "wb" (write binary) so yes; it will overwrite the file if it exists...
Hi all
would it be possible to merge the automation script with the gunzip script so that tradedoubler feeds not only are gunzipped but also can be uploaded in groups in xml or csv format??
P Stone
Is there an equivalent script that will work with direct ftp downloads??
Shareasale don't allow annonymous download, which means that the download url is:
ftp://USERNAME:PASSWORD@datafeeds.shareasale.com/MERCHID/MERCHID.txt
However, when adding this url to the script above, it creates a 0kb file and returns:
Warning: fopen(ftp://...@datafeeds.shareasale.com/MERCHID/MERCHID.txt):
failed to open stream: FTP server reports :
220 ShareASale Datafeed FTP Server (v. 1.0) in /home/sites/SITEURL/public_html/SUBDOMAIN/feeds/download.php
on line 3Is there an alternative method for accessing ftp servers??
Cheers,
Duncan
madstock.com
[edit: dmorison - fixed scroll]
Hi Duncan,
Have you looked at the other web fetching script in this thread:
http://www.pricetapestry.com/node/71
That uses the wget program to download files, and should support authentication FTP...
That script works marvellously, thanks!
To incorporate that script into my CRON would the line in the sh file look like this?
php{SPACE}/home/sites/SITE/public_html/PATH/TO/fetch.php?\
url=ftp://USERNAME:PASSWORD@datafeeds.shareasale.com/\
MERCHANTID/MERCHANTID.txt&filename=MERCHANTID.txt
for each merchant, or would it be better to create a php script of some sort containing all of the merchant details and a load of exec commands??
Apologies if that sounds like a dim question, but I would be looking to automate the process.
madstock.com
[edit: dmorison - fixed scroll]
That would sort of work if you modified the script to accept GET instead of POST variables, however it can become complicated as a result of URL encoding (i.e. when you try and specify a URL on the command line that itself contains ? and & characters...!
Instead, what i'd do is modify the script as below to make it use a function to fetch each file; call the function for each feed you want to download, and then finally of course you can fire off import.php....!
<?php
// destination directory for downloaded files - must be writable by PHP
$targetDir = "../feeds/";
// path to wget program for retrieval
$wgetProgram = "/usr/bin/wget";
// path to various unzip programs
$unzipPrograms["zip"] = "/usr/bin/unzip";
$unzipPrograms["rar"] = "/usr/bin/unrar";
$unzipPrograms["gzip"] = "/usr/bin/gzip";
// check that target directory is writable, bail otherwise
if (!is_writable($targetDir))
{
print "<p>target directory ($targetDir) not writable - exiting</p>";
exit();
}
// check that wget binary exists, bail otherwise
if (!file_exists($wgetProgram))
{
print "<p>wget program ($wgetProgram) not found - using PHP method</p>";
$usePHP = true;
}
// check for and disable any unzip methods that do not exist
foreach($unzipPrograms as $name => $program)
{
if (!file_exists($program))
{
print '<p>unzip program for '.$name.' ('.$program.') not found - disabled</p>';
unset($unzipPrograms[$name]);
}
}
function fetch_url($url,$filename)
{
$source = fopen($url,"r");
$destination = fopen($filename,"w");
if (!$source || !$destination) return;
while(!feof($source))
{
fwrite($destination,fread($source,2048));
}
fclose($source);
fclose($destination);
}
function unzip_zip($header,$filename)
{
global $unzipPrograms;
// check if zip format
if ($header <> "PK".chr(0x03).chr(0x04)) return false;
$command = $unzipPrograms["zip"]." -p ".$filename." > ".$filename.".unzipped";
exec($command);
unlink($filename);
rename($filename.".unzipped",$filename);
return true;
}
function unzip_rar($header,$filename)
{
global $unzipPrograms;
// check if rar format
if ($header <> "Rar!") return false;
$command = $unzipPrograms["rar"]." p -inul ".$filename." > ".$filename.".unrarred";
exec($command);
unlink($filename);
rename($filename.".unrarred",$filename);
return true;
}
function unzip_gzip($header,$filename)
{
global $unzipPrograms;
// gzip only way to tell is to try
$command = $unzipPrograms["gzip"]." -c -d -S \"\" ".$filename." > ".$filename.".ungzipped";
exec($command);
if (filesize($filename.".ungzipped"))
{
unlink($filename);
rename($filename.".ungzipped",$filename);
return true;
}
else
{
unlink($filename.".ungzipped");
return false;
}
}
function fetch($url,$filename)
{
global $usePHP;
global $targetDir;
global $wgetProgram;
global $unzipPrograms;
$temporaryFilename = $targetDir.uniqid("");
// fetch the file
if ($usePHP)
{
fetch_url($url,$temporaryFilename);
}
else
{
$command = $wgetProgram." --ignore-length -O ".$temporaryFilename." \"".$url."\"";
exec($command);
}
// bail if download has failed
if (!file_exists($temporaryFilename))
{
print "<p>download failed - exiting</p>";
exit();
}
// read the first 4 bytes to pass to unzip functions
$filePointer = fopen($temporaryFilename,"r");
$fileHeader = fread($filePointer,4);
fclose($filePointer);
// try and unzip the file by calling each unzip function
foreach($unzipPrograms as $name => $program)
{
$unzip_function = "unzip_".$name;
if ($unzip_function($fileHeader,$temporaryFilename)) break;
}
// finally rename to required target (delete existing if necessary)
$targetFilename = $targetDir.$filename;
if (file_exists($targetFilename))
{
unlink($targetFilename);
}
rename($temporaryFilename,$targetFilename);
// all done!
}
fetch("http://www.example.com/feed.xml","feed1.xml");
fetch("ftp://username:password@www.example.org/merchant/123.txt","123.txt");
exec("php /path/to/scripts/import.php @MODIFIED");
?>
Then, simply cron fetch.php on it's own and let it do all the work...
That would seem to be ideal - however it runs all the way through, and bails at the end, leaving the error "download failed - exiting".
I have tested this with a multitude of feeds from several networks, and it is the same for all of them - really bizarre, as they all download with no problems when using the form, so all of the necessary bits and pieces (unzip, untar, wget etc.) would appear to be there, and the folder is writable.
Sorry to be a pain!
Duncan
Righteo - I have had a look at this again from a different perspective, and came up with the following code, which works (i.e. downloads and gunzips the files via ftp), however are there any major issues with resources etc. that are glaringly obvious to people who know what they are on about??
<?php
set_time_limit(0);
ignore_user_abort();
print "<p>Getting</p>";
exec("wget ftp://USERNAME:PASSWORD@datafeeds.shareasale.com/MERCH/MERCH.txt.gz");
exec("wget ftp://USERNAME:PASSWORD@datafeeds.shareasale.com/MERCH2/MERCH2.txt.gz");
print "<p>Gunzipping</p>";
exec("gunzip -f *.gz");
print "<p>Done.</p>";
?>
Thanks in advance for pointing out if I have done something really stupid that is likely to cause problems on the server...
madstock.com
Oooops... forget to global in the global variables in my modification of the original script - that's why it wouldn't have worked. I've fixed it in the code above.
Otherwise, your alternative script looks fine... not a lot can go wrong with that and tie up the server.
One tip when calling wget with a URL parameter; always put the URL in escaped quotation marks to avoid any problems with the shell interpretter when the URL contains ? and &'s. For example:
<?php
exec("wget \"http://www.example.com/feed.xml?merchantID=123&affiliateID=456\"");
?>
The other "hot tip" when it comes to using wget is to use the -O parameter in order to control the output file; otherwise you are sometimes left to the devices of whatever filename wget uses; which be something horrible! What I do is this:
<?php
exec("wget -O \"feed.xml\" \"http://www.example.com/feed.xml?merchantID=123&affiliateID=456\"");
?>
Thanks for the tip there - the only real reason that I took to downloading the .txt.gz file was that when using the ".txt" version wget wouldn't overwrite the existing feed (leading to 4567.txt , 4567.txt.1 , 4567.txt.2 etc..)- despite looking through the help documentation and my usual google trawl I was unable to find the -O command, so that will be most useful, thanks.
Duncan
madstock.com
I have had problems running the download.php file. All the programs were in the right locations and eventually opened a ticket with my host.
When I ran the download.php script it was giving me this error
wget program (/usr/bin/wget) not found - using PHP method
unzip program for zip (/usr/bin/unzip) not found - disabled
unzip program for rar (/usr/bin/unrar) not found - disabled
unzip program for gzip (/usr/bin/gzip) not found - disabled
My hosts came back with the following:
Plesk, while a great product, locks you into a sandbox and you cannot touch things outside of it. The plesk generated apache configuration uses a variable call php open_base_dir which tells php to only execute / source scripts from within the website directory or /tmp. The only way to resolve this is to add a per website apache configuration that plesk will read.
Hope this helps someone else!
Hi David,
I wonder if you could give me a hand with the above post please? I'm still struggling to get the download script to work because of the "open_base_dir"
My Host said to create a vhost.conf and put the following: -
php_admin_value open_basedir "/home/httpd/vhosts/gynogapod.co.uk:/tmp:/bin:/usr/
bin"I am trying to run the "download.php" on the subdomain http://pricecompare.gynogapod.co.uk
The above hasn't worked and can't think what my next step should be!
Thanks in advance.
Computer Hardware
I'd be tempted to remove open_basedir alltogether; but I think you can only do this if you can access conf/httpd.include
Typically, a Plesk generated httpd.include contains the following:
<IfModule sapi_apache2.c>
php_admin_flag engine on
php_admin_value open_basedir "/home/httpd/vhosts/hatrick.net/httpdocs:/tmp"
</IfModule>I would start by removing that entire section - make a backup of your original httpd.include first of course.
The other point to note is that you must restart Apache after making any changes - you can do this from within the Plesk control panel.
Is it your own root server; or is do you have a reseller account?
Hi David,
I have a dedicated box with Rackspace.
What would the security issues if any be if I remove the above section?
If you resold hosting on your server; that setting would prevent your customers from uploading a PHP script that had access to the entire file system of your server.
If the server is used for your own purposes; and you trust every PHP script that you are running then in my opinion it is safe to remove that setting, on the grounds that if anybody were able to compromise your box in order to upload their own PHP script they would equally be able to alter the configuration accordingly.
However, it is possible that the setting provides protection against some scripts that, whilst you may trust them; may have security holes that enable an attacker to manipulate a form into giving access to any file in your filing system; so at the end of the day it's got to be your own assesment as to any risks you may face.
It is probably safe to remove the setting as a quick test anyway; just to prove whether that is what is preventing the fetching script from accessing wget and unzip etc. on your server. You can always switch it back afterwards...
Cheers,
David.
Something fishy going on here...
I am able to run this code
set_time_limit(0);
ignore_user_abort();
print "<p>Getting</p>";
exec("wget http://pf.tradedoubler.com/unlimited/unlimxxxxxxxxxxx3.xml.gz");
print "<p>Gunzipping</p>";
exec("gunzip -f *.gz");
print "<p>Done.</p>";but not this one
// destination directory for downloaded files - must be writable by PHP
$targetDir = "../feeds/";
// path to wget program for retrieval
$wgetProgram = "wget";
// path to various unzip programs
$unzipPrograms["zip"] = "unzip";
$unzipPrograms["rar"] = "unrar";
$unzipPrograms["gzip"] = "gzip";
more code.....Don't think it's to do with the "php_admin_value open_basedir" anymore as one script works and not the other. Any suggestions?
Based on the location from which you run that script; have you confirmed that "../feeds/" exists and is writable. That is the only immediate difference I can pick up from the 2 snippets; as they are both doing exactly the same thing...
Hi David,
YEs the /feeds/ folder is definately writeable.
I have since my last post and as per you recommendation removed the section
<IfModule sapi_apache2.c>
php_admin_flag engine on
php_admin_value open_basedir "/home/httpd/vhosts/hatrick.net/httpdocs:/tmp"
</IfModule>Restarted Apache.... still no joy.
I did in fact notice that with the first script that worked, although it downloaded the xml it didn't unzip it and ended up all scrambled.
I am going to read through the entire slection of posts on this subject just to make sure im not doing anything wrong. If you have any more suggestions in the mean time I would be greatful.
Hiya,
It would also be worth checking over the info in this thread:
Product Feed Download Automation Guide (Linux Servers)
In particular; how it describes the best way to construct the wget and gzip commands using specific parameters which can help nail a number of issues.
Hi
Did you ever get the plesk issue sorted out and what did you do as I currently have the same issuse on a Plesk VPS and could do with some help sorting it.
I think the issu is that you cant execute the command but not to sure if you can use a shell execute command to unzip the files from php.
I have the script to get the file in as .csv.gz then need to run a routine to unzip and delete them on a Linux VPS Server.
Does anyone know if this could work in a script and how to formulate it. I want to read all the.gz files convert them to csv and then delete all the .gz files from a PHO script.
<?php
?
shell_exec("gunzip filename.tar.gz");
shell_exec("tar -xvf filename.tar");
?>
Thanks
Roy
Hi Roy,
Some of this info in this thread might help:
http://www.pricetapestry.com/node/198
I agree with you that Plesk (via its enforced Apache configuration) should not prevent shell_exec from doing whatever you want; although i'm not sure how this relates to "Safe Mode", which may have something to say about it. The key points to think about:
a) When this command is executed; what directory will it be running from. This is important so that you know whether you need to specify absolute path names for filenames etc.
b) What permissions will the process that is running the script have against the files / directories that you are trying to read
c) What permissions will the process that is running the script have against the commands that you are trying to execute
d) Within the system environment in which the shell_exec function will be used, will the OS be able to locate the commands, or will you need to use full paths (e.h. "/path/to/gunzip"). Something that works on the command line won't necessarily work in shell_exec because the environment is different; most notibly the PATH variable used to find programs.
e) Will the script have write permissions into the directory in which the command will generate output
One way to check everything is just to shell out to the touch command, which will try and create an empty file:
shell_exec("touch testing.txt");Hi
I tried enter the touch command from the console but it returned this:- -bash: line 34: syntax error near unexpected token "touch testing.txt"
I did this through winscp3 and the console and and from both my VPS login and the master VPS root login.
1) I want to run the PHP script from the following http://www.domain.co.uk/phones/feeds/fetchTD.php
2) That location will be made 777
3) I have used your locate commmands and found the following files:-
wget = /usr/bin/wget
unzip = /usr/bin/unzip
Gzip = /bin/gzip
php = /usr/bin/php
4) The absolute path on the server is as follows if this help to the directory.
/var/www/vhosts/domain.co.uk/httpdocs/phones/feeds
5) When I have run the fetch routine you designed i get the following errors:-
wget program (/usr/bin/wget) not found - using PHP method
unzip program for zip (/usr/bin/unzip) not found - disabled
unzip program for rar (/usr/bin/unrar) not found - disabled
unzip program for gzip (/bin/gzip) not found - disabled
Like I have mentioned before the files are there which is great and they are automated in but need to automate the unzipping of them.
Both CJ and TradeDoubler are zipped files so cant use these as yet as the fetch does not seem to work either on the VPS server.
Thanks again.
Roy
Hi Roy,
I still haven't come up with a solution yet. I have a post on the official plesk site and also with the forum on Rackspace, my host. Maybe someone will have an answer. Ill post should anything come up!
Regards,
Simon
Dont know if this is already done but here is the sollution if you have a shared plesk without ssh etc. etc.
Automatic download and unzip function:
<?php
// destination directory for downloaded files - must be writable by PHP
$targetDir = "feeds/";
// path to wget program for retrieval
$wgetProgram = "/usr/local/bin/wget";
// path to various unzip programs
$unzipPrograms["zip"] = "/usr/bin/unzip";
//$unzipPrograms["rar"] = "/usr/bin/unrar";
$unzipPrograms["gzip"] = "/usr/bin/gzip";
// check that target directory is writable, bail otherwise
if (!is_writable($targetDir))
{
print "<p>target directory ($targetDir) not writable - exiting</p>";
exit();
}
// check that wget binary exists, bail otherwise
if (!file_exists($wgetProgram))
{
print "<p>wget program ($wgetProgram) not found - using PHP method</p>";
$usePHP = true;
}
// check for and disable any unzip methods that do not exist
/*foreach($unzipPrograms as $name => $program)
{
if (!file_exists($program))
{
print '<p>unzip program for '.$name.' ('.$program.') not found - disabled</p>';
unset($unzipPrograms[$name]);
}
}
*/function fetch_url($url,$filename)
{
$source = fopen($url,"r");
$destination = fopen($filename,"w");
if (!$source || !$destination) return;
while(!feof($source))
{
fwrite($destination,fread($source,2048));
}
fclose($source);
fclose($destination);
}
function unzip_zip($header,$filename)
{
global $unzipPrograms;
// check if zip format
if ($header <> "PK".chr(0x03).chr(0x04)) return false;
$command = $unzipPrograms["zip"]." -p ".$filename." > ".$filename.".unzipped";
exec($command);
unlink($filename);
rename($filename.".unzipped",$filename);
return true;
}
function unzip_rar($header,$filename)
{
global $unzipPrograms;
// check if rar format
if ($header <> "Rar!") return false;
$command = $unzipPrograms["rar"]." p -inul ".$filename." > ".$filename.".unrarred";
exec($command);
unlink($filename);
rename($filename.".unrarred",$filename);
return true;
}
function unzip_gzip($header,$filename)
{
global $unzipPrograms;
// gzip only way to tell is to try
$command = $unzipPrograms["gzip"]." -c -d -S \"\" ".$filename." > ".$filename.".ungzipped";
exec($command);
if (filesize($filename.".ungzipped"))
{
unlink($filename);
rename($filename.".ungzipped",$filename);
return true;
}
else
{
unlink($filename.".ungzipped");
return false;
}
}
function fetch($url,$filename)
{
global $usePHP;
global $targetDir;
global $wgetProgram;
global $unzipPrograms;
$temporaryFilename = $targetDir.uniqid("");
// fetch the file
if ($usePHP)
{
fetch_url($url,$temporaryFilename);
}
else
{
$command = $wgetProgram." --ignore-length -O ".$temporaryFilename." \"".$url."\"";
exec($command);
}
// bail if download has failed
if (!file_exists($temporaryFilename))
{
print "<p>download failed - exiting</p>";
exit();
}
// read the first 4 bytes to pass to unzip functions
$filePointer = fopen($temporaryFilename,"r");
$fileHeader = fread($filePointer,4);
fclose($filePointer);
// try and unzip the file by calling each unzip function
foreach($unzipPrograms as $name => $program)
{
$unzip_function = "unzip_".$name;
if ($unzip_function($fileHeader,$temporaryFilename)) break;
}
// finally rename to required target (delete existing if necessary)
$targetFilename = $targetDir.$filename;
if (file_exists($targetFilename))
{
unlink($targetFilename);
}
rename($temporaryFilename,$targetFilename);
// all done!
}
fetch("http:// affiliate url.com ","yourfile.xml");
exec("php ../scripts/import.php @MODIFIED");
?>HEnk
all credits to David!!!
Tried Henk his file, i keep getting ;
target directory (feeds/) not writable - exiting
feeds/ is fuly writable (777)
suggestions?
Hi Duncan,
Have you looked at the other web fetching script in this thread:
http://www.pricetapestry.com/node/71
That uses the wget program to download files, and should support authentication FTP...david,
the page above "Page not found"
i think i saw one CJ automation script but i cant seem to find it now.
also thanks to sgpratley and Henk for sharing the scripts. :) very much appreciated.
cheers,
atman
Not sure what happened to the original!
However, the "wget" version is in the modification posted above (which comes fromt he original - it just has the unzip section commented out) so if you have wget on your server it will use it.
Don't forget the main thread on automation also..
http://www.pricetapestry.com/node/198
Cheers,
David.
MikeyC,
Sounds a bit strange - make sure that you are running the script in the root directory of your Price Tapestry installation as "feeds/" is a relative path name....
Cheers,
David.
david,
Not sure what happened to the original!
perhaps it was deleted by the thread starter :)
will test the code above.
Can't seem to be able to download large feeds if try to download the them by selecting the download file file thru my browser. (298794 products)
Will this also be the case if i put it in a cron job?
Hi Mikey,
It might be your browser timing out. It's certainly worth trying through a cron job - wget is a good transfer agent and far more suited to downloading large files.
Otherwise, it would indicate a problem at the other end (wherever the feed is coming from) rather than a browser specific issue.
Cheers,
David.
Thanks David, I will try the cronjob and check if it downloads all the feeds.
Hi all,
Wonder where you can help - i get the following message when trying to use the automaticed download :-
target directory (feeds/) not writable - exiting
Any ideas?
I'm using UK2.net services and slightly unsure where to go with this?
Any help would be greatly appreciated,
Have a good weekend
Michael @ www.ThePriceSite.co.uk
Hi Michael,
It's the PHP process that must be able to write to the /feeds/ directory when using this script. On a normal Linux / Apache / PHP installation, the "PHP user" is normally whatever user Apache is running as. This might be "apache" or "httpd". However, the easiest thing to do is to make the directory writable by _anybody_.
If you can access your site by FTP, try right-clicking on the /feeds/ directory in the remote window, and see if there is an option to set properties or permissions - something like that.
Then, you might be given the option to set the read / write permission at 3 different levels: user - group - world. If you set write access to the folder for "world", that will definitely include the user that PHP is running as.
Cheers,
David.
Hi David hope all is well.
I tried the permissions and i seem to be getting further.
I now have a new error message below :-
wget program (/usr/local/bin/wget) not found - using PHP method
Warning: filesize(): Stat failed for feeds/451e63d39626e.ungzipped (errno=2 - No such file or directory) in /home/t/h/thepricesite_co_uk/mike.php on line 71
Warning: unlink(feeds/451e63d39626e.ungzipped): No such file or directory in /home/t/h/thepricesite_co_uk/mike.php on line 79
I also appear to have a list of bizzare files named like 451e63d39626e etc with file extentions. I presume its download part of the file and bombed out?
Again any help would be great,
Michael @ ThePriceSite
Hi Mike,
That sounds like you don't have gzip installed on your server (which is strange); however you can disable that function in the script by deleting this line at the top:
$unzipPrograms["gzip"] = "/usr/bin/gzip";The "funny" filenames are the temporary filename so it does appear that the script is existing before completing.... or is the download still working it's just leaving the temporary file lying around?
David - i took out the line in the code as you said.
I now get a message saying
wget program (/usr/local/bin/wget) not found - using PHP method
Although i'm trying to download a feed from AffiliateWindow named - BangCDProductFeed - the file already exists and i'm trying to over-write it - the file stays at 90bytes and doesn't appear to get any bigger.
I don't appear to get those 'funny' filenames anymore though.
Michael @ ThePriceSite
Hi Michael,
You can get rid of the wget warning (it's just telling you that wget wasn't found so it's trying to use the PHP method) by removing the following line:
print "<p>wget program ($wgetProgram) not found - using PHP method</p>";Now, because your script is having to use PHP to fopen() the remote file, it is possible that you do not have URL wrappers enabled. To find out, create a phpinfo() script like this:
test.php
<?php
phpinfo();
?>
...then look down the page for:
allow_url_fopen...and check that it is "On" in both columns.
Cheers,
David.
David - i have removed the line as you said
print "wget program ($wgetProgram) not found - using PHP method";
I have also put the test.php on the server and tested that - it says On for local and master.
I still get the following message and my feed, xml file is still 90bytes?
wget program (/usr/local/bin/wget) not found - using PHP method
Michael @ ThePriceSite
Hi Michael,
I'm not sure why removing the line that prints the warning message didn't help. Did you remove it completely or just comment it out? Check that the function hasn't been coppied twice into your script as that could explain why it's still printing.
Is it a Linux hosting account that you have? I presume you do not have command line access to your server. There is a possibility that your server is firewalled from making outbound HTTP connections, so one thing that it is worth trying is a simple script just to download and display the content of a file from another server.
Try the script below:
<?php
header("Content-Type: text/xml");
$url = "http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml";
$file = fopen($url,"r");
if ($file)
{
while(!feof($file))
{
$xml = fread($file,1024);
print $xml;
}
}
else
{
print "Could not open ".$url;
}
?>
It might say something like "Error loading stylesheet:" but that means it's worked - it's just the BBC don't use an absolute URL to their XSL document (like a .css file but for XML).
If you're still having no joy perhaps you could post the script that you are using as it stands so I can have a look at what else to suggest.
Cheers,
David.
Morning David,
I got the error message about the style sheet yes. Below is my extact code, obviously with the username/password removed.
<?php
// destination directory for downloaded files - must be writable by PHP
$targetDir = "feeds/";
// path to wget program for retrieval
$wgetProgram = "/usr/local/bin/wget";
// path to various unzip programs
$unzipPrograms["zip"] = "/usr/bin/unzip";
//$unzipPrograms["rar"] = "/usr/bin/unrar";
// check that target directory is writable, bail otherwise
if (!is_writable($targetDir))
{
print "<p>target directory ($targetDir) not writable - exiting</p>";
exit();
}
// check that wget binary exists, bail otherwise
if (!file_exists($wgetProgram))
{
print "<p>wget program ($wgetProgram) not found - using PHP method</p>";
$usePHP = true;
}
// check for and disable any unzip methods that do not exist
/*foreach($unzipPrograms as $name => $program)
{
if (!file_exists($program))
{
unset($unzipPrograms[$name]);
}
}
*/function fetch_url($url,$filename)
{
$source = fopen($url,"r");
$destination = fopen($filename,"w");
if (!$source || !$destination) return;
while(!feof($source))
{
fwrite($destination,fread($source,2048));
}
fclose($source);
fclose($destination);
}
function unzip_zip($header,$filename)
{
global $unzipPrograms;
// check if zip format
if ($header <> "PK".chr(0x03).chr(0x04)) return false;
$command = $unzipPrograms["zip"]." -p ".$filename." > ".$filename.".unzipped";
exec($command);
unlink($filename);
rename($filename.".unzipped",$filename);
return true;
}
function unzip_rar($header,$filename)
{
global $unzipPrograms;
// check if rar format
if ($header <> "Rar!") return false;
$command = $unzipPrograms["rar"]." p -inul ".$filename." > ".$filename.".unrarred";
exec($command);
unlink($filename);
rename($filename.".unrarred",$filename);
return true;
}
function unzip_gzip($header,$filename)
{
global $unzipPrograms;
// gzip only way to tell is to try
$command = $unzipPrograms["gzip"]." -c -d -S \"\" ".$filename." > ".$filename.".ungzipped";
exec($command);
if (filesize($filename.".ungzipped"))
{
unlink($filename);
rename($filename.".ungzipped",$filename);
return true;
}
else
{
unlink($filename.".ungzipped");
return false;
}
}
function fetch($url,$filename)
{
global $usePHP;
global $targetDir;
global $wgetProgram;
global $unzipPrograms;
$temporaryFilename = $targetDir.uniqid("");
// fetch the file
if ($usePHP)
{
fetch_url($url,$temporaryFilename);
}
else
{
$command = $wgetProgram." --ignore-length -O ".$temporaryFilename." \"".$url."\"";
exec($command);
}
// bail if download has failed
if (!file_exists($temporaryFilename))
{
print "<p>download failed - exiting</p>";
exit();
}
// read the first 4 bytes to pass to unzip functions
$filePointer = fopen($temporaryFilename,"r");
$fileHeader = fread($filePointer,4);
fclose($filePointer);
// try and unzip the file by calling each unzip function
foreach($unzipPrograms as $name => $program)
{
$unzip_function = "unzip_".$name;
if ($unzip_function($fileHeader,$temporaryFilename)) break;
}
// finally rename to required target (delete existing if necessary)
$targetFilename = $targetDir.$filename;
if (file_exists($targetFilename))
{
unlink($targetFilename);
}
rename($temporaryFilename,$targetFilename);
// all done!
}
fetch("http://shopwindow.affiliatewindow.com/datafeed_products.php?user=XXXXX&password=XXXXX&mid=1344&format=XML&nozip=1'","BangCDProductFeed.xml");
exec("php ../scripts/import.php @MODIFIED");
?>
Michael @ ThePriceSite
Hi Michael,
It seemed to work OK on my server - but i've redone the script with all the unnecessary code removed so it's worth having a go with this new version first:
<?php
// destination directory for downloaded files - must be writable by PHP
$targetDir = "feeds/";
// path to wget program for retrieval
$wgetProgram = "/usr/local/bin/wget";
// path to various unzip programs
$unzipPrograms["zip"] = "/usr/bin/unzip";
// check that target directory is writable, bail otherwise
if (!is_writable($targetDir))
{
print "<p>target directory ($targetDir) not writable - exiting</p>";
exit();
}
function fetch_url($url,$filename)
{
$source = fopen($url,"r");
$destination = fopen($filename,"w");
if (!$source || !$destination) return;
while(!feof($source))
{
fwrite($destination,fread($source,2048));
}
fclose($source);
fclose($destination);
}
function unzip_zip($header,$filename)
{
global $unzipPrograms;
// check if zip format
if ($header <> "PK".chr(0x03).chr(0x04)) return false;
$command = $unzipPrograms["zip"]." -p ".$filename." > "<
Hi,
> when I try to call it from a command line it says the script doesn't exist?!
Are you including the preceeding ./ as on a Linux server, the current directory isn't generally in the search path, and there is no default to search the current directory as is the case on windows.
If that doesn't work, check back shortly, as i'm going to re-write the automation scripts so that they can be invoked via HTTP as well as from the command line...