Support forum login

©2006-2008 IAAI Software

Contact Us

Product Feed Download Automation Guide (Linux Servers)

Submitted by dmorison on Sat, 2006-04-15 09:17.

Hi everyone;

As promised; here are some hints / tips / guidelines for automating the download, uncompression and importing of affiliate product feeds from network servers to your Price Tapestry installation. Please note that this level of automation is not a necessary requirement of running the script; you can keep your site up to date simply by downloading feeds manually, and then uploading them to the /feeds/ directory of your Price Tapestry installation via FTP just as you would any other file.

Issues To Be Addressed

Firstly, to appreciate what is required of an automation strategy, it is important to understand the the main problems areas that must be handled:

1) Very few affiliate networks provide clean URLs for their product feeds. This results in 2 related issues:

- i) The format of the URL may have undesired effects when scripted
- ii) The download script (wget) cannot derive a sensible filename for the downloaded output

2) The output may be compressed, although you may be able to control this as part of the feed URL (e.g. on Affiliate Window you can use "&nozip=1" on the end - but it's best to leave downloads compressed if possible as it saves everyone bandwidth!)

3) If the output is compressed; the embedded filename within the archive may be incompatible with your file system; or with your current permissions at the current working directory within the file system (i.e. if subdirectories are implied as part of the filename).

System Requirements for Automation

In order to setup an automation strategy using these guidelines, the following will be required:

1) To test that you have access to the necessary programs you will require shell access to your server / web hosting account, either via Telnet, or more commonly these days via SSH. A popular SSH client for Windows is "Putty", which you can get from here: http://www.chiark.greenend.org.uk/~sgtatham/putty/

2) The ability to execute PHP from the "command line". You can test this having logged in simply by typing "php -v", which will display the PHP version:

$php -v
4.2.2
$

(note: In this and all other example on this page the $ symbol indicates the command prompt, and should not be entered as part of the command)

3) You also need to know the path to php; as when the automation commands are executed via cron the directory containing the php binary may not be in the PATH environment variable and will need to be provided in full. The first thing to do is to "guess" where it might be, and 9 out of 10 times the following will work:

$/usr/bin/php -v
4.2.2
$

If that works, it means that php is installed in the /usr/bin/ directory. If not, you might be able to locate it using the following command:

$locate -r /php$
/path/to/php
$

4) For automating feed retrieval, you will need access to the wget command, and again you will need to know where it is installed on your server:

$locate -r /wget$
/path/to/wget
$

5) For uncompression of feeds using unzip or gzip you will need access to these commands and again you will need to know where they are installed on your server:

$locate -r /unzip$
/path/to/unzip
$

$locate -r /gzip$
/path/to/gzip
$

If you don't have access to the "locate" command, or locate indicates that it has no database to work with and will not be able to find any files, and you are not able to locate the commands by guessing your host should be able to provide the necessary information.

It is possible that you do not need to specify a path; as the process that executes your cron job may be able to find the commands without a path - so it's worth trying anyway.

6) A mechanism to execute multiple commands via cron. There are 2 options:

- i) Create a single "shell script" contining all the commands and then create a cron entry to execute that script

- ii) Create a single cron entry to call each required command with semi-colon separation between each command

In theory it would be possible to create individual cron entries for each command; but they would have to be timed sufficiently far apart as to be sure that the commands were processed in the correct sequence.... so not recommended!

My preference is to use option i); as this lets you edit and work on your automation script on your local computer; and then simply upload it to your server whenever you have made changes.

7) Having created an uploaded the script, you will need to know the full path to that script in order to setup your cron job. For example, if your automation script is called "fetch.sh", the full path may be something like:

/home/webusers/example/fetch.sh

8) Finally, you will need to know the full path to your Price Tapestry installation directory (which may be your web root directory). This might be something like:

/home/webusers/example/public_html/

Summary Of Programs and Paths That You Need To Know

PHP:
/path/to/php (e.g. /usr/bin/php)

WGET:
/path/to/wget (e.g. /usr/bin/wget)

UNZIP (optional, if required):
/path/to/unzip (e.g. /usr/bin/unzip)

GZIP (optional if required):
/path/to/gzip (e.g. /usr/bin/gzip)

Path to the directory where you are planning to upload fetch.sh:
/path/to/fetch.sh (e.g. /home/webusers/example/fetch.sh)

Path to your Price Tapestry installation directory:
/path/to/pt/ (e.g. /home/webusers/example/public_html)

Note: The path to Price Tapestry will be used in the following example script to locate the feeds directory (/path/to/pt/feeds) and the scripts directory (/path/to/pt/scripts).

Creating fetch.sh

So, armed with the above, you can now create an automation script.

There are 4 sections to the script:

1) The first line, which indicates which command interpreter to use, this is normally:

#!/bin/sh

2) Calls to wget to fetch each feed

The key components in constructing a call to "wget" to fetch a feed are:

i) Use the -O parameter to set the output file. I would recommend a simple filename that is the merchant name, and either .xml (for XML feeds) or .txt (for text files). If you know that the download will be compressed, use the same base filename, but append .zip (for zip compressions) or ".gz" for GZIP compression.

ii) Put the feed URL in quotes

The following entries would download 3 feeds, one that is not compressed, 1 compressed using zip, and the third using gzip compression.

/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "http://www.example.com/feeds/getfeed.php?id=1234"
/path/to/wget -O "/path/to/pt/feeds/merchantB.xml.zip" "http://www.example.org/download.asp?id=5678"
/path/to/wget -O "/path/to/pt/feeds/merchantC.txt.gz" "http://feeds.example.net/get.php?merchantID=98765"

3) [optional] Calls to decompression programs if required

In this example, 2 of our feeds are compressed, so we must decompress them. Because you do not know (and do not care) what embedded filename the feed has, the easiest thing to do is decompress to "stdout" and pipe the uncompressed output into the required filename:

/path/to/unzip -p /path/to/pt/feeds/merchantB.xml.zip > /path/to/pt/feeds/merchantB.xml
rm /path/to/pt/feeds/merchantB.xml.zip
/path/to/gzip -c -d -S "" /path/to/pt/feeds/merchantC.txt.gz > /path/to/pt/feeds/merchantC.txt
rm /path/to/pt/feeds/merchantC.txt.gz

4) Call to the Price Tapestry import.php script:

Finally, having downloaded, unzipped and deleted the zipped versions, we can call Price Tapestry's import automation script:

/path/to/php /path/to/pt/scripts/import.php @MODIFIED

Example fetch.sh

The following complete script shows the above fragments combined together into the complete script, just remember to replace "/path/to" with appropriate real-life values for your server:

fetch.sh

#!/bin/sh
#########
# FETCH #
#########
/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "http://www.example.com/feeds/getfeed.php?id=1234"
/path/to/wget -O "/path/to/pt/feeds/merchantB.xml.zip" "http://www.example.org/download.asp?id=5678"
/path/to/wget -O "/path/to/pt/feeds/merchantC.txt.gz" "http://feeds.example.com/get.php?merchantID=98765"
#########################
# DECOMPRESS AND DELETE #
#########################
/path/to/unzip -p /path/to/pt/feeds/merchantB.xml.zip > /path/to/pt/feeds/merchantB.xml
rm /path/to/pt/feeds/merchantB.xml.zip
/path/to/gzip -c -d -S "" /path/to/pt/feeds/merchantC.txt.gz > /path/to/pt/feeds/merchantC.txt
rm /path/to/pt/feeds/merchantC.txt.gz
##########
# IMPORT #
##########
/path/to/php /path/to/pt/scripts/import.php @MODIFIED

To finalise;

1) Upload your automation script to your server as /path/to/fetch.sh

2) You may also need to mark the script as executable. You can either do this via SSH as follows:

$cd /path/to/
chmod +x fetch.sh

Alternatively, you should be able to mark the file as executable via FTP. Most FTP clients have a mechanism to view properties or set permissions for files (on the server). On the properties / permission panel you should see a checkbox or similar mechanism to mark the file as executable.

3) All that's left is to setup fetch.sh to be executed via cron. Most web hosting accounts let you setup cron jobs via your control panel. Simply create a new job, to be executed at the required frequency (e.g daily) and choose an appropriate time. The command to execute is simple:

/path/to/fetch.sh

I know it looks like quite a lot to take in, but it's really mostly about the knowledge of directories on your server and where the programs are; after that it's just a case of creating fetch.sh, uploading, mark as executable, and schedule via cron. Hope this helps!

Submitted by PriceSpin on Sat, 2006-04-15 16:22.

David I get this when trying to test it at command line...

: bad interpreter: No such file or directory/bin/sh

Robert

Submitted by dmorison on Sat, 2006-04-15 16:24.

First thing is to try without any interpreter. If the script is marked as executable it may work; but i'm not sure if the same applies when invoked via cron.

You could also try to locate the "sh" binary as per the other commands:

$locate -r /sh$
/bin/sh
$

...although it won't show /bin/sh if that is not valid on your server, but if it does show up then replace the first line of your script with the appropriate path.

Submitted by PriceSpin on Sat, 2006-04-15 16:32.

it appears to have worked at command line without the first line... will attempt a cron...

Dont think tradedoubler likes to work with this method...

Submitted by dmorison on Sat, 2006-04-15 16:42.

What specific problem are you having with TD feeds? - I've had them working OK with wget before. They're gzipped by default I think...

Submitted by PriceSpin on Sat, 2006-04-15 16:44.

yeah, they are xml.gz files but everytime it trys to download it in command line it says error 404 file not found even tho I pasted the exact URL given by TD into the fetch.sh file.

** also the fetch.sh without the first line works on cron as well **

Submitted by dmorison on Sat, 2006-04-15 16:47.

Excellent!

Re: Tradedoubler; I just had another go to check, and it worked OK. I used this command:

wget -O "currys.xml.gz" "http://pf.tradedoubler.com/unlimited/unlimited_pf_xxx_yyyyyyy.xml.gz"

Send me an email with the actual URL you're trying to use if you like and i'll give it a go...

Submitted by dmorison on Sat, 2006-04-15 16:58.

Hmmm - your TD URL worked fine on my server... let me have a think about this, i'll try and come up with some ideas...!

Submitted by PriceSpin on Sat, 2006-04-15 16:59.

ok, it works when running wget url but not when cron/command line is doing it, it says:
Resolving pf.tradedoubler.com... 217.212.240.174
Connecting to pf.tradedoubler.com|217.212.240.174|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
18:01:18 ERROR 404: Not Found.

also, anyone have any ideas about buy.at??? 8000 product feed which at around 40% downloaded manually the size is 4.5mb but when downloading through the automated download is only 44k fully downloaded...

Submitted by dmorison on Sat, 2006-04-15 17:09.

It must be the same issue - something is upsetting wget when it is being run from within another script, but I can't think what that might be - especially seeing as the TradeDoubler URLs are clean - they should work even without the quotes.

Can you try a controlled experiment to try and find out what is going on. There will be some feeds on your own server in your Price Tapestry /feeds/ directory. Try and automate the downloading of one of those as a test file, so the command would be:

wget -O "test.xml" "http://www.yoursite.com/feeds/somefeed.xml"

Try that in both modes (i.e. wget on its own, and then from inside another script); and see if you get similar problems...

Submitted by PriceSpin on Sat, 2006-04-15 17:20.

ok, wget works but doesnt inside the fetch.sh... althought blueshoots.xml from webgains does work inside fetch.sh

Submitted by dmorison on Sat, 2006-04-15 17:37.

Can you the post the 3 URLs you are using, masking out any IDs etc:

TD (not working)
buy.at (not working)
Webgains (working)

It might just give an insight into what might be happening...

Submitted by PriceSpin on Mon, 2006-04-17 03:00.

ok, before I post urls I had a more indepth look... and it appears that buy.at needs some things added to the URLs so I will look at that.

As for TD... it appears to be adding %0D to the end of the url even tho the full line appears as:
/usr/bin/wget -O /home/pricespi/public_html/feeds/unlimited_pf_xxx_xxxxxxx.xml.gz http://pf.tradedoubler.com/unlimited/unlimited_pf_xxx_xxxxxxx.xml.gz

---

--03:54:54-- http://pf.tradedoubler.com/unlimited/unlimited_pf_xxx_xxxxxxx.xml.gz%0D
=> `/home/pricespi/public_html/feeds/unlimited_pf_xxx_xxxxxxx.xml.gz'
Resolving pf.tradedoubler.com... 217.212.240.174
Connecting to pf.tradedoubler.com|217.212.240.174|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
03:54:54 ERROR 404: Not Found.

Submitted by dmorison on Mon, 2006-04-17 05:36.

Hiya,

The URL (and your local file) must be in quotes for wget to work reliably in a shell script.

wget -O "/path/to/file.xml.gz" "http://pf.tradedoubler.com/foo/pf.xml.gz"

What's happening is the carriage return from the Windows style end of line marker is being passed to wget, which assumes that it is part of the URL. You could save the file using Unix style end of lines (if your text editor supports it, but as quotes are an essential component to making this work it's probably not worth looking at doing that.

Submitted by PriceSpin on Mon, 2006-04-17 11:34.

Hello David,
I only tried removing the quotes for the last attempt... if it helps, Im using Zend Development Editor to create the .sh file... would it be adding the extra characters??

Submitted by dmorison on Mon, 2006-04-17 17:00.

I wonder if it's a character encoding issue?

Can you control the character-set that the ZDE saves the file in? If there is an option for basic 8-bit ASCII (perhaps US-ASCII) you might try that...

Submitted by PriceSpin on Mon, 2006-04-17 18:30.

tried changing the encoding but it doesnt appear to have helped...

Submitted by dmorison on Mon, 2006-04-17 18:40.

Would you be able to email me the URLs that you're trying to use, and i'll create and test a script on my server then send you the script back as an attachment....

Cheers,
David.

Submitted by crounauer on Mon, 2006-05-08 12:07.

I would just like to add that the current stable version of Putty (0.58), which can be found at http://www.chiark.greenend.org.uk/~sgtatham/putty/ will not run if you don't have service pack 2 installed and are running windows XP.

The developers know about the problem and are working on a fix as we speak!

The work around is to install the "current development snapshot". This is not always recommended as it is not considered stable and may still contain bugs - Use at your own risk!

Regards,
Simon.

Computer Hardware

Submitted by crounauer on Tue, 2006-05-09 11:44.

Hi David,

Is it ok to leave the /bin/ folder writeable for placing the Shell script in?

Computer Hardware

Submitted by crounauer on Tue, 2006-05-09 11:49.

Sorry David,

Please ignore the last post. Just figured out that the fetch.sh can be put in any directory and doesn't go in the /bin/ folder!

Simon

Computer Hardware

Submitted by crounauer on Tue, 2006-05-09 12:10.

Hi David,

In your sample fetch.sh script above, are the two double quotes in the right place in this snippet of code?

/path/to/gzip -c -d -S "" /path/to/pt/feeds/merchantC.txt.gz > /path/to/pt/feeds/merchantC.txt

Computer Hardware

Submitted by ROYW1000 on Tue, 2006-05-09 12:51.

David

I think the best option for me is to leave safe mode on rather than off due to other users on the server.

Getting the files in for both trade doubler and Affiliate window works fine so I am now looking for an unzip script to unzip these files from the cron. Do you have a script that does this?

Also i tried to Import ALL and Import MODIFIED and some of the larger feeds fell over due to no response for more than 30 seconds and errors on the magic parser error. Have you experienced this before?

I will post the error codes tonight as I am no online a the moment.

Roy

Submitted by crounauer on Tue, 2006-05-09 13:11.

Hi David / Pricespin,

Did you ever resolve the problem with error 404?

I am having the same issues with TD and also when trying to download a feed from my own site.

`/home/httpd/vhosts/gynogapod.co.uk/subdomains/lingerie/httpdocs/feeds/contessa-test.xml'
Resolving lingerie.gynogapod.co.uk... 212.100.243.205 Connecting to lingerie.gynogapod.co.uk|212.100.243.205|:80... connected. HTTP request sent, awaiting response... 404 Not Found 14:04:00 ERROR 404: Not Found.

Computer Hardware

Submitted by PriceSpin on Tue, 2006-05-09 23:40.

I didnt get it resolved yet... tried everything my end with no success.

Robert
PriceSpin - Price Comparison
DJHangout - DJ Equipment Comparison

Submitted by dmorison on Wed, 2006-05-10 08:34.

Hi Simon,

The -S "" is correct - it means that the .gz extension is not assumed.

In your script that is getting a 404; firstly, make sure that you have quotes around your URL as this is mega-critical in all of this. Secondly, as I notice that you are having the same error on your server; could you perhaps have a look at your log and see what was actually requested Vs what you think should have been requested. You should be able to see the 404 in the log and this might give some clues...

Submitted by crounauer on Wed, 2006-05-10 09:20.

Hi David,

This is from my log file...

212.100.243.205 - - [09/May/2006:14:24:00 +0100] "GET /feeds/contessa.xml%0D HTTP/1.0" 404 4203 "-" "Wget/1.10.2 (Red Hat modified)"
212.100.243.205 - - [09/May/2006:14:25:00 +0100] "GET /feeds/contessa.xml%0D HTTP/1.0" 404 4203 "-" "Wget/1.10.2 (Red Hat modified)"

Im just going to double check my fetch.sh code as it seems to be adding the %0D on at the end.

Computer Hardware

Submitted by crounauer on Wed, 2006-05-10 09:38.

amd this is my fetch.sh code...

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/httpd/vhosts/xxxxxxxxxx/httpdocs/feeds/contessa-test.xml" "http://lingerie.gynogapod.co.uk/feeds/contessa.xml"

Computer Hardware

Submitted by dmorison on Wed, 2006-05-10 09:53.

Well you learn something every day!

What it seems is happening, is that the quotes are not all-encompassing. Characters after the closing quote are still included on the URL, up until the UNIX end of line, or the space character.

Now, because the %0D character is part of the Windows (MS-DOS) style end of line style; when the script is run on a Unix box it is used as part of the URL. The solution, I think therefore is to add a SPACE character after the closing quotes, like this:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/httpd/vhosts/xxxxxxxxxx/httpdocs/feeds/contessa-test.xml" "http://lingerie.gynogapod.co.uk/feeds/contessa.xml"[SPACE]

(where [SPACE] is just a space character)

Now, the worst that should happen is that wget tries to retrieve %0D on its own, but hopefully that error can just be ignored...

Let me know if this works....!

Cheers,
David.

Submitted by crounauer on Wed, 2006-05-10 09:58.

I did a debug using sh -x fetch.sh and got this result.

Still seems to be they stay %0D that it is adding at the end.

Also checked for syntax errors using sh -n fetch.sh and this came out clean.

+ /usr/bin/wget -O /home/httpd/vhosts/gynogapod.co.uk/subdomains/lingerie/httpdocs/feeds/contessa-test.xml http://lingerie.gynogapod.co.uk/feeds/contessa.xml
--10:55:19-- http://lingerie.gynogapod.co.uk/feeds/contessa.xml%0D
           => `/home/httpd/vhosts/gynogapod.co.uk/subdomains/lingerie/httpdocs/feeds/contessa-test.xml'
Resolving lingerie.gynogapod.co.uk... 212.100.243.205
Connecting to lingerie.gynogapod.co.uk|212.100.243.205|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
10:55:19 ERROR 404: Not Found.

Computer Hardware

Submitted by dmorison on Wed, 2006-05-10 10:00.

Is that even after adding the space? (we might have cross-posted)

Is your text editor stripping end of line white space perhaps?

Submitted by crounauer on Wed, 2006-05-10 10:14.

Hi David,

You are quite right, you do learn every day!

I have had success by adding in the space as recommended.

Thanks again for your fanatical support!

+ /usr/bin/wget -O /home/httpd/xxxx/httpdocs/feeds/contessa-test.xml http://lingerie.gynogapod.co.uk/feeds/contessa.xml
--11:08:26-- http://lingerie.gynogapod.co.uk/feeds/contessa.xml
           => `/home/httpd/vhosts/xxx/httpdocs/feeds/contessa-test.xml'
Resolving lingerie.gynogapod.co.uk... 212.100.243.205
Connecting to lingerie.gynogapod.co.uk|212.100.243.205|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 350,537 (342K) [text/xml]
100%[====================================>] 350,537 --.--K/s
11:08:26 (82.42 MB/s) - `/home/httpd/vhosts/xxxxxxxxxx/lingerie/httpdocs/feeds/contessa-test.xml' saved [350537/350537]

Computer Hardware

Submitted by crounauer on Wed, 2006-05-10 10:34.

Is this a similar problem with spaces?

11:30:00 (1015.00 KB/s) - `/home/httpd/vhosts/xxxxxxx/lingeri e/httpdocs/feeds/figleaves.xml.gz' saved [757993/757993]
--11:30:00-- http://%0D/
           => `/home/httpd/vhosts/xxxxxxxxxxxxxx/lingerie/httpdocs/f eeds/figleaves.xml.gz'
Resolving \015... failed: Name or service not known.

Computer Hardware

Submitted by dmorison on Wed, 2006-05-10 10:37.

Sort of - that's what I susspected might happen; because the wget command can fetch a list of URLs all separeted by the space, so what you had previously was:

wget URL[funny character][new line]

you now have...

wget URL [funny character][new line]

...so once it's fetched URL, it fetches "funny character" on its own.

Hopefully it's not too much of an issue and can be ignored...

Alternatively, you might want to see if you text editor can change the end of line mode to "UNIX" which should help...

Submitted by crounauer on Wed, 2006-05-10 11:46.

Hi David,

Just an update and for others that might be struggling...

I downloaded a text editor from http://www.flos-freeware.ch/, which as you suggested allowed me to change the line end mode to "UNIX". After removing absolutely all foreign characters (the only one present should be a "LF - line feed" character at the end of each line), uploaded it via ftp and everything ran smoothly.

Without being able to change the line end to "Unix" it was throwing up errors when gzip was being called and when the gzip file tried to be deleted too.

Hope this is of some use to others...

Computer Hardware

Submitted by crounauer on Wed, 2006-05-10 12:05.

I also had to change the following file paths to absolute locations in import.php

require("/home/xxxxx/httpdocs/includes/common.php");
require("/home/xxxxx/httpdocs/includes/admin.php");
require("/home/httpd/xxxxxxx/httpdocs/includes/filter.php");
require("/home/httpd/xxxxxxxxxxxx/httpdocs/includes/MagicParser.php");

and these two in includes/common.php

require("/home/httpd/xxxxxxx/httpdocs/config.php");
$common_path = "/home/httpd/xxxxxxxxxxxxxx/httpdocs/includes/";

This I think is because the commands are being called form root, so it needs absolute locations.

Computer Hardware

Submitted by dmorison on Wed, 2006-05-10 19:37.

Ah yes. An alternative to hard coding the paths is to get your script to change directory into the same directory as import.php before calling it:

cd /path/to/pt/scripts
/path/to/php /path/to/pt/scripts/import.php @MODIFIED

Submitted by PriceSpin on Tue, 2006-05-16 20:59.

David,
Can I send you my fetch.sh to look at? I used the one you sent me and added the space at the end and it seems to work fine except Im getting alot of:
Resolving pf.tradedoubler.com... 217.212.240.174
Connecting to pf.tradedoubler.com|217.212.240.174|:80... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
21:53:28 ERROR 503: Service Temporarily Unavailable.

but it is only on some and then I also get:
gzip: incorrect suffix '(null)'
rm: cannot lstat `/home/path/to/feeds/whsmith.xml.gz\r': No such file or directory

Dont know if its something to do with my fetch.sh

Robert
PriceSpin - Price Comparison
DJHangout - DJ Equipment Comparison

Submitted by dmorison on Wed, 2006-05-17 04:01.

Sure, i'll take a look.

It does sound like it's still to do with the file mode (end of line representation) as "\r" implies carriage-return...

Submitted by PHILDARV on Sun, 2006-07-09 14:37.

David - Firstly I want to say what a great script this is and secondly following this thread i was having similar problems to others but found that creating the fetch.sh with a program called edit pad pro it resolved the issues with line breaks etc. Now got a live site Designer Clothes ;-)

Submitted by multiz on Fri, 2006-07-14 18:18.

I was having problems with my fetch.sh page. So, here are the steps I had to take to make it executable:

1. You may have to click "Change Permissions" after selecting "fetch.sh".
2. Then chmod or change the permissions to "0777". (Another way to do this would be to check all the check boxes--indicating the permissions are set to "0777".

Hope this helps everyone. It helped me.
PS: This really just depends on your webserver's setup. For example, I'm using Linux/Unix and CPanel X.

Software Price Comparison

Submitted by Scutter on Mon, 2006-10-09 10:01.

Hi,
Having problems getting this working :(

The error I get is....

importing cdwow.xml...[0/6340]PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 22
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27
PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 22
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27

Any suggestions?

Thank in advance.

Submitted by dmorison on Mon, 2006-10-09 11:12.

Hi,

Is this only happening when import.php is called from within your automation script? If you just run it on its own; called from the command prompt ($) do you see the same warnings:

$php import.php cdwow.xml <return>

I assume that import is working fine from within the administration interface?

Cheers,
David.

Submitted by Scutter on Mon, 2006-10-09 11:18.

Works fine from the admin interface

[root@localhost scripts]# php import.php ../feeds/cdwow.xml
Content-type: text/html
X-Powered-By: PHP/4.3.11

backfilling reviews... PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 22
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27
backfilling reviews...[done]

Submitted by dmorison on Mon, 2006-10-09 11:32.

That's interesting. Is there any problem with the moderate reviews feature within the admin interface or does that work OK?

Submitted by dmorison on Mon, 2006-10-09 11:34.

Ah - another thing to point out - import.php expects the filename to be in the feed directory, so you don't need to specify the relative path - it should be just:

php import.php cdwow.xml

The automation scripts do have to be run from the scripts/ directory however (but that shouldn't make any difference in this case)

Submitted by Scutter on Mon, 2006-10-09 12:30.

I can confirm that the admin interface works as it should with me manually able to register and import the cdwow feed.

But from the command line, I get the errors mentioned.

Submitted by dmorison on Mon, 2006-10-09 12:59.

Hi,

I've just sent you a "debug" version of database.php that will print out the query that is failing, that should help find out what's going on...

Cheers,
David.

Submitted by Scutter on Tue, 2006-10-10 09:09.

Right, thanks for the file, I have uploaded it to try it.
I also downloaded a fresh copy of the feed this time in csv format for testing.

backfilling reviews... [UPDATE `price_products` SET rating='0',reviews='0']
[SELECT product_name,AVG(rating) as rating,COUNT(id) as reviews FROM `price_reviews` WHERE approved <> '0' GROUP BY product_name]
PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 32

Submitted by dmorison on Tue, 2006-10-10 09:24.

Hi,

That's really strange - there's no reason why those queries should fail. I'm wondering if there is a MySQL permissions problem when running from the command line instead of as an Apache module. Can you try the following from the command prompt ($) in the Price Tapestry installation directory:

$php setup.php

It will print out all the HTML that the script normally generates, but the crucial thing is to see whether any of the DB tests fail, which would indicate that your CLI version of PHP does not have database access..... If you could post the output from the above that would help...

Cheers,
David.

Submitted by Scutter on Tue, 2006-10-10 10:32.

I had to strip it down as the comment box thought it was suspicious :)

 <tr align="left">
          <td> <p>Checking database connection...PASS</p><p>Checking database selection...PASS</p><p>Checking database tables...PASS</p><p><small>Setup Completed!</small></p><p><small>Using PHP Version 4.3.11</small></p><p><small>Using MySQL Version 4.1.15-standard</small></p> </tr>
            </table>
          </td>
        </tr>

Submitted by murph on Wed, 2006-10-11 09:43.

I have fetch.sh working via command line (sh -x fetch.sh), but the cron isn't. I have cpanel and i have literally just set one up using the path to fetch.sh. Is there something else i should try?

Submitted by Eddie on Tue, 2006-11-14 09:15.

David,

I have no cpanel etc as such but an account area where I can set the cron job. However my host (having read this post - or so they say) tell me that:

"Hello Eddie,
You cannot call a .sh script from the browser. Please rename it to .cgi.
Please run the cronjob url in the browser to test it.
I hope it helps."

Is there any reason why fetch.cgi wouldnt work ?

Eddie

Submitted by dmorison on Tue, 2006-11-14 10:29.

Hi Eddie,

That is unlikely to work i'm afraid - even if your server is configured to run a shell script when saved as .cgi you will have all sorts of timeout issues.

What happens if you just upload fetch.sh into your site via FTP; and then set the path to the file (you will have to figure out where on the server your web tree is located, you might be able to tell from your FTP client) as the cron job in cPanel... for example:

/home/users/eddie/vhosts/www.example.com/httpdocs/fetch.sh

That is the standard way of doing it; but it does require that you have permission to execute shell scripts on your server...

Cheers,
David.

Submitted by Eddie on Tue, 2006-11-14 11:06.

David,

Thanks for this (and all other replies - nice result at the weekend for reading!)

Right... my hosts (which so far have been very reliable) are servage.net.

The 'span' the sites over many boxes blah blah.... as a result the feature are limited on the back end. For instance I have their cpanel set up (which they designed) and not the standard cpanel that you come to expect withy your hosting account!

The only way to set a cron job is shown in the example on www.shopsort.co.uk/cron.jpg

I have set the permissions to both 755 and 77 when testing for fetch.sh via FTP.

So far the hosts have said:

You cannot call a .sh script from the browser. Please rename it to .cgi.
Please run the cronjob url in the browser to test it.

And

As we do not support in scripting, so I am afraid that we can help you much regarding this. So, please check the code of your cronjob script with any experienced script writer. As they are the best person, who can guide you. And reagarding any issue with the server, I am forwarding this ticket to our admin and he will look into your request as the first thing tomorrow morning (Tuesday).
Thank you in advance for your patience.

So I am guessing this isn’t going to work. What do you think ?

Submitted by Eddie on Tue, 2006-11-14 13:26.

David,

Still not getting my head round above!

Any ideas ?

Eddie

Submitted by dmorison on Tue, 2006-11-14 13:55.

Hi Eddie,

It looks from that screen shot as if you are only allowed to setup cron jobs for PHP or other web based scripts; not something like fetch.sh which is a Unix command line shell script - nothing to do with your web server.

This means that in order to automate your Price Tapestry setup you are going to need to develop PHP methods of downloading files and running the import. Do you still have the code you setup previously...

http://www.pricetapestry.com/node/24

Basically you want to use the same trick...

Hope this helps,
David.

Submitted by Eddie on Tue, 2006-11-14 14:09.

David,

Thanks for that. It is all comming back to me.

I havent got the code from before... I will let you know how I get on.

Cheers

Eddie

Submitted by Eddie on Tue, 2006-11-14 15:58.

David,

Mu host just got back to me with:

"CGI script can be very hard to debug, so i would recommend the php way.

You can wrap the system commands up the the php system() function.

http://www.php.net/manual/en/function.system.php

Example:

system('/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "http://www.example.com/feeds/getfeed.php?id=1234"');

And for php scripts:
include("/path/to/pt/scripts/import.php");

Please change the paths in the following script to the correct values.

/shopsort.co.uk/cron/fetch.php

I hope it helps."

Will that help or should I just go for your last advice ?

Eddie

Submitted by dmorison on Tue, 2006-11-14 16:03.

Worth trying Eddie...

Submitted by Eddie on Tue, 2006-11-14 18:18.

David,

Using

added " round the MODIFIED to get it working and it prints out the following:

Warning: set_time_limit(): Cannot set time limit in safe mode in /xxxx/shopsort.co.uk/scripts/import.php on line 3 importing awProductFeed.xml...[0/930] importing awProductFeed.xml...[100/930] importing awProductFeed.xml...[200/930] importing awProductFeed.xml...[300/930] importing awProductFeed.xml...[400/930] importing awProductFeed.xml...[500/930] importing awProductFeed.xml...[600/930] importing awProductFeed.xml...[700/930] importing awProductFeed.xml...[800/930] importing awProductFeed.xml...[900/930] importing awProductFeed.xml...[done] backfilling reviews... backfilling reviews...[done]

why the error and what now ?

Thanks

Eddie

Submitted by dmorison on Tue, 2006-11-14 22:48.

Hi Eddie,

It's a PHP warning rather than an error - and is part of PHPs "Safe Mode" operation.

As it happens, you didn't actually reach the time limit, so everything worked despite the warning. In order to not see this warning (and to prevent a longer import from failing in the future) you should see if you can get Safe Mode turned off on your account. Your host should be able to do this for you - if you explain that you are running a script that can take more than 30 seconds, and for this reason uses the set_time_limit() function to disable the timeout, they should be prepared to disable safe mode on your account.

For more information on Safe Mode...
http://uk.php.net/features.safe-mode

Cheers,
David.

Submitted by dmorison on Tue, 2006-11-14 22:48.

Hi Eddie,

It's a PHP warning rather than an error - and is part of PHPs "Safe Mode" operation.

As it happens, you didn't actually reach the time limit, so everything worked despite the warning. In order to not see this warning (and to prevent a longer import from failing in the future) you should see if you can get Safe Mode turned off on your account. Your host should be able to do this for you - if you explain that you are running a script that can take more than 30 seconds, and for this reason uses the set_time_limit() function to disable the timeout, they should be prepared to disable safe mode on your account.

For more information on Safe Mode...
http://uk.php.net/features.safe-mode

Cheers,
David.

Submitted by bwhelan1 on Tue, 2007-03-20 21:51.

What changes should be made to fetch.sh to get feeds from the Shareasale FTP which does require authentication?

Bill Whelan

Computer Store

Submitted by dmorison on Wed, 2007-03-21 05:39.

Hello Bill,

wget supports FTP with authentication. There is a standard method for providing your username and password on the URL, which is as follows:

ftp://username:password@ftp.example.com/path/to/file.xml

Therefore, a line similar to the following in your fetch.sh file should do the trick:

/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "ftp://username:password@ftp.example.com/feeds/merchantA.xml"

Hope this helps,
Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-03-21 17:38.

David,

Using the following code for fetch.sh:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/xxxxxx/public_html/feeds/10851.zip" "ftp://username:password@datafeeds.shareasale.com/10851/10851.zip"
/usr/bin/wget -O "/home/xxxxxx/public_html/feeds/11733.zip" "ftp://username:password@datafeeds.shareasale.com/11733/11733.zip"
#########################
# DECOMPRESS AND DELETE #
#########################
/bin/gunzip -p /home/xxxxxx/public_html/feeds/10851.zip > /home/xxxxxx/public_html/feeds/10851.txt
rm /home/xxxxxx/public_html/feeds/10851.zip
/bin/gunzip -p /home/xxxxxx/public_html/feeds/11733.zip > /home/xxxxxx/public_html/feeds/11733.txt
rm /home/xxxxxx/public_html/feeds/11733.zip
##########
# IMPORT #
##########
/usr/local/bin/php /home/xxxxxx/public_html/scripts/import.php @MODIFIED

I get the follwing results after running it with cron:

--12:17:01-- ftp://username:*password*@datafeeds.shareasale.com/10851/10851.zip
          => `/home/xxxxxx/public_html/feeds/10851.zip'
Resolving datafeeds.shareasale.com... done.
Connecting to datafeeds.shareasale.com[198.65.110.138]:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /10851 ... done.
==> PASV ... done. ==> RETR 10851.zip ... done.
   0K .......... .......... .......... .......... .......... 215.52 KB/s
  50K .......... .......... .......... .......... .......... 694.44 KB/s
 100K .......... .......... .......... .......... .......... 510.20 KB/s
 150K .......... .......... .......... .......... .......... 515.46 KB/s
 200K .......... .......... .......... .......... .......... 632.91 KB/s
 250K .......... .......... .......... .......... .......... 364.96 KB/s
 300K .......... .......... .......... .......... .......... 657.89 KB/s
 350K .......... .......... .......... .......... .......... 462.96 KB/s
 400K .......... .......... .......... .......... .......... 384.62 KB/s
 450K .......... .......... .......... .......... .......... 666.67 KB/s
 500K .......... .......... .......... .......... .......... 446.43 KB/s
 550K .......... .......... .......... .......... .......... 495.05 KB/s
 600K .......... .......... .......... .......... .......... 537.63 KB/s
 650K .......... .......... .......... ...... 497.98 KB/s
12:17:03 (462.84 KB/s) - `/home/xxxxxx/public_html/feeds/10851.zip' saved [703335]
--12:17:03-- ftp://username:*password*@datafeeds.shareasale.com/11733/11733.zip
          => `/home/xxxxxx/public_html/feeds/11733.zip'
Resolving datafeeds.shareasale.com... done.
Connecting to datafeeds.shareasale.com[198.65.110.138]:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /11733 ... done.
==> PASV ... done. ==> RETR 11733.zip ...
No such file `11733.zip'.
/bin/gunzip: invalid option -- p
/bin/gunzip: invalid option -- p
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxx/public_html/scripts/import.php on line 5
Fatal error: main(): Failed opening required '../includes/common.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/xxxxxx/public_html/scripts/import.php on line 5

Any ideas? Can I pay you to get this working?

Bill Whelan

Submitted by dmorison on Wed, 2007-03-21 18:11.

Hi Bill,

I notice that you are using gunzip, which doesn't have a -p option. The equivalent option (unzip to console) is -c, so try changing this section as follows:

/bin/gunzip -c /home/xxxxxx/public_html/feeds/10851.zip > /home/xxxxxx/public_html/feeds/10851.txt
rm /home/xxxxxx/public_html/feeds/10851.zip
/bin/gunzip -c /home/xxxxxx/public_html/feeds/11733.zip > /home/xxxxxx/public_html/feeds/11733.txt
rm /home/xxxxxx/public_html/feeds/11733.zip

Hope this helps!
Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-03-21 19:25.

David,

Would it be necessary to force an overwrite of the existing files with a -f since the files from the previous upload would still exist in the feeds folder?

Bill Whelan

Computer Store

Submitted by dmorison on Wed, 2007-03-21 19:29.

Hi Bill,

That shouldn't be necessary with -c because the unzip is to standard output, which is then being redirected to the appropriate filename via a pipe ">"...

Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-03-21 21:13.

David,

I realize that I included a file that was not in my SAS FTP folder. After deleting the lines for that file I get this as an output for the cron job:

--15:45:00-- ftp://username:*password*@datafeeds.shareasale.com/10851/10851.zip
          => `/home/xxxxxxxx/public_html/feeds/10851.zip'
Resolving datafeeds.shareasale.com... done.
Connecting to datafeeds.shareasale.com[198.65.110.138]:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /10851 ... done.
==> PASV ... done. ==> RETR 10851.zip ... done.
   0K .......... .......... .......... .......... .......... 218.34 KB/s
  50K .......... .......... .......... .......... .......... 657.89 KB/s
 100K .......... .......... .......... .......... .......... 531.91 KB/s
 150K .......... .......... .......... .......... .......... 400.00 KB/s
 200K .......... .......... .......... .......... .......... 632.91 KB/s
 250K .......... .......... .......... .......... .......... 500.00 KB/s
 300K .......... .......... .......... .......... .......... 549.45 KB/s
 350K .......... .......... .......... .......... .......... 413.22 KB/s
 400K .......... .......... .......... .......... .......... 537.63 KB/s
 450K .......... .......... .......... .......... .......... 632.91 KB/s
 500K .......... .......... .......... .......... .......... 520.83 KB/s
 550K .......... .......... .......... .......... .......... 500.00 KB/s
 600K .......... .......... .......... .......... .......... 362.32 KB/s
 650K .......... .......... .......... ...... 646.50 KB/s
15:45:02 (464.72 KB/s) - `/home/xxxxxxxx/public_html/feeds/10851.zip' saved [703335]
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxxxx/public_html/scripts/import.php on line 5
Fatal error: main(): Failed opening required '../includes/common.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/xxxxxxxx/public_html/scripts/import.php on line 5

It seems to upload the feed, unzip it then delete the file just fine but the import fails. If I peel off the code in the script that for file import then run a separate cron job for import it works fine. I should probably just live with that but do you have any ideas why it fails to work when included in fetch.sh?

Bill Whelan

Computer Store

Submitted by bwhelan1 on Thu, 2007-03-22 01:48.

David,

It appears to be working with the following:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/xxxxxxxx/public_html/feeds/10851.zip" "ftp://username:password@datafeeds.shareasale.com/10851/10851.zip"
#########################
# DECOMPRESS AND DELETE #
#########################
/bin/gunzip -c /home/xxxxxxxx/public_html/feeds/10851.zip > /home/xxxxxxxx/public_html/feeds/10851.txt
rm /home/xxxxxxxx/public_html/feeds/10851.zip
##########
# IMPORT #
##########
php /home/xxxxxxxx/public_html/scripts/import.php @MODIFIED

I just changed this:

/usr/local/bin/php /home/xxxxxxxx/public_html/scripts/import.php @MODIFIED

To this:

php /home/xxxxxxxx/public_html/scripts/import.php @MODIFIED

and now it works. Thanks for all your help.

Bill Whelan

Computer Store

Submitted by SteveC on Fri, 2007-07-06 10:40.

Hi David,
Im getting similar problems to the last post - although the reason given by Bill makes no sense to me - he doesnt appear to have changed anything. Anyway heres what im getting:

Warning: main(../includes/common.php): failed to open stream: No such file or directory in /pathToPricetapestry/scripts/import.php on line 5
Fatal error: main(): Failed opening required '../includes/common.php' (include_path='.:') in /pathToPricetapestry/scripts/import.php on line 5

It looks like its looking for /includes/common.php in /sripts/import.php !?

Thanks, Steve.

Submitted by dmorison on Fri, 2007-07-06 10:43.

Hi Steve,

This sounds like PHP on your platform is not configured to behave as if it is running from the script directory. This is not a usual configuration, but it can happen.

The solution is to include a change directory command (cd) to the scripts directory in your automation script before calling import.php. For example:

cd /path/to/pricetapestry/scripts
php import.php @MODIFIED

That should help...
Cheers,
David.

Submitted by SteveC on Fri, 2007-07-06 11:25.

Thanks David,
The additional cd command worked.

Steve.

Submitted by tbbd2007 on Sun, 2007-08-05 11:40.

David

I have done all as directed above, but am experiencing a couple of problems.

They are:
1) Manually running fetch.sh via Putty shows the following 'importing 1st4phonecards.xml...[done]
importing avon.xml...[done]
importing 24electric.xml...[done]
importing 247electrical.xml...[done]
importing 7dayshop.xml...[done]
importing 911_experience.xml...[done]
importing advanced_mp3_players.xml...[done]
importing a_quarter_of.xml...[done]
importing a1_gifts.xml...[done]
importing activity_gifts.xml...[done]
importing abm_opticians.xml...[done]
importing activity_superstore.xml...[done]
importing alaskan_air_conditioners.xml...[done]
importing amalgam_shop.xml...[done]
backfilling reviews...[done]'. When I check the feeds they are all a fraction of what their size should be.
2) PT hasn't imported them at all, the merchant names are still there and registered, but there are no products for them.
3) The directory where fetch.sh resides is filling up with files with names from the download URLs, i.e. there is a file named 'datafeed_products.php?user=xxxxx&password=xxxxx&mid=xxxx&format=xml&dtd=1.2' from the downloaded feed of 'http://datafeeds.productserve.com/datafeed_products.php?user=xxxxx&password=xxxxx&mid=xxxx&format=xml&dtd=1.2'.

Initially the shell script wouldn't run, but now having downloaded a trial of a bash shell script editor it does. I have checked and it is uploading as Ascii.

Some help with this would be brilliant as this is far beyond me.

Kind regards

Stephen

Submitted by dmorison on Sun, 2007-08-05 11:59.

Hello Stephen,

This sounds like the automation of wget is not working correctly; and therefore the actual .xml files are not the feeds themselves - and it sounds like the long filenames with the feed URL is probably the feed content.

Can you post an example of one of your wget lines (don't forget to remove any sensitive information like your password etc.) and i'll take a look for you...

Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 12:11.

David

One of the lines is '/usr/bin/wget -o "/homepages/x/xxxxx/htdocs/xxxxx/xxxxx/feeds/xxxxx.xml" "http://datafeeds.productserve.com/datafeed_products.php?user=xxxxx&password=xxxxx&mid=xxxx&format=xml&dtd=1.2"'.

Yours

Stephen

Submitted by dmorison on Sun, 2007-08-05 12:16.

Hi Stephen,

I think you need to use -O (upper case) instead of -o, which means something different.

It should then work fine!

Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 12:27.

David

Thanks for that, I had just spotted that myself when I was getting the list of wget commands, 'O' and 'o' are separate commands, silly mistake!

They have all been checked, but most have permission errors.

Thanks for your help.

Stephen

Submitted by dmorison on Sun, 2007-08-05 16:13.

Hi Stephen,

Is everything working now, or do you still need help with the permission errors? The main thing of course when setting up an automation script is that whatever user the CRON job is running as has write access to feeds folder... This can sometimes catch people out because it works when they are testing their shell script, but when running via cron you get permission errors...

Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 16:48.

David

I'm still getting the same problem, I'm on a shared hosting package and I have tested my password, but it isn't the same as the root. How do I either get round this or find out the root password?

Stephen

Submitted by dmorison on Sun, 2007-08-05 17:38.

Hi Stephen,

The usual solution is to make the feeds folder "world writable". This will give any user on your system, including whatever user your cron job is running as, full write access to the folder. Don't worry - it doesn't have any bearing on HTTP access to the folder.

To do this, login to your account, and try using the following commands:

cd path/to/pricetapestry
chmod a+w feeds

This will make the feeds/ directory world writable, and should solve any permission problems.

Hope this helps,
Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 19:03.

David

That does essentially work, apart from receiving error messages from AW, but the messages state I need to apply for permission from them, which I have now done.

This is going to be good I hope!

Thanks again for all your help!

Stephen

Submitted by tbbd2007 on Mon, 2007-08-06 11:04.

David

I have got permission from AW now and successfully downloaded their feeds, however every code line in the XML file is filled with 'CDATA' before and after the data and PT won't parse this information. How do I get around that?

Stephen

Submitted by dmorison on Mon, 2007-08-06 11:22.

Hi Stephen,

There shouldn't be any problem with Affiliate Windows feeds - I use plenty of them on my sites, and CDATA is a standard XML feature that should be handled correctly (it allows fields to contain things like HTML that would otherwise break the XML).

What are the actual sympoms of the problem you are having? Are the feeds auto-detecting correctly? Can you see the first product record when registering the feed etc.?

If you're not sure where the problem lies, if you could email me a link to one of the feeds directly from your site (i.e. the name of the feed in your /feeds directory and the URL of your site) I will download it and check it out on my test server. Reply to your reg code or forum registration email is the easiest way to get me...

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 10:41.

David,
could you suggest a suitable mod that ensures the import does NOT occur if something goes wrong - Ive recently had situation where it looks like feed download part failed but it still overwrote the existing feed with a file size of zero and then tried to import it, resulting in all products being lost.
Id probably like to extend this to write the results of any failure to a log and auto email it to myself.

cheers, Steve.

Submitted by dmorison on Mon, 2007-08-20 10:44.

Hi Steve,

Yes - i'll look into this, shouldn't be too hard. Are you using a standard Linux shell script as described above, using WGET to download the feeds?

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 11:09.

David,
yes, script above - the only difference is that i have to change to the scripts directory before calling import.php on my system. I'm also using one script per affiliate network and these are set via cron to run on different days so I dont get the problem of an empty database if anything goes wrong! Affiliate window are currently moving thier feeds to a new server so I presume there are others having this same problem!

thanks, Steve.

Submitted by dmorison on Mon, 2007-08-20 11:14.

Hi Steve,

The safest, and quickest way to implement this I think is to modify the import function in includes/admin.php to abandon the import if the file size is zero. This will mean you only have to make one change, and your automation scripts can stay as they are.

To add this check, look for the following code on line 312:

global $admin_importCallback;

..and ADD the following block of code immediately AFTER that line:

if (!filesize($filename))
{
  return $filename." empty, aborting";
}

That should do the trick!

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 13:58.

David, Perhaps this will work fine on my rented linux box but I develop on windows and get the following error:

Warning: filesize() [function.filesize]: stat failed for feed3.xml in C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\includes\admin.php

If this is a platform issue can you suggest the best way to enable the code to safely run on any platfom?

Steve.

Submitted by dmorison on Mon, 2007-08-20 14:02.

Hi Steve,

Ah - i've come across filesize() issues on Windows before. As a quick fix, if you're not likely to have 0 byte feed problems on your dev server, you can supress that warning by changing the code as follows:

if (!@filesize($filename))
{
  return $filename." empty, aborting";
}

(the @ infront of any PHP internal function suppresses any warnings generated by that function)

It also occurred to me that you could use PHP's mail() function to send your alert email, changing the code further still as follows:

if (!@filesize($filename))
{
  mail("your@email.com","Empty File Alert!",$filename);
  return $filename." empty, aborting";
}

This is assuming that mail() is capable of sending email from your server(s)...

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 15:05.

David,
the warning no longer displays in browser but the entire import still aborts unless I comment out this bit of code (still on my dev server). yes - using the mail function here is ideal!

Steve.

Submitted by dmorison on Mon, 2007-08-20 15:12.

Hi Steve,

Ooops - yes that would be a problem. I've looked at the docs for filesize() and it returns FALSE on error, so the following version should trap this...

if ( (!@filesize($filename)) && (filesize($filename)!==FALSE) )
{
  mail("your@email.com","Empty File Alert!",$filename);
  return $filename." empty, aborting";
}

Hope this helps,
Cheers,
David.

Submitted by redspan on Thu, 2007-11-01 11:32.

Hello,

I've made some progress with this and I have a Cron job on a linux server that runs the fetch file and it fetches the first test file. In each test I've done the result has been a new file in the right directory with the right name, but the file size is always 0. These tests were done with feeds from Webgains. I've tested all these files manually so I know they work if downloaded in xml format (no compression).

Here are some of the responses the server has sent me. First I tried a gzip version:

08:43:02 (11.85 MB/s) - `/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.txt.gz' saved [87/87]
gzip: /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.txt.gz: not in gzip format
Content-type: text/html X-Powered-By: PHP/4.3.9
importing BaggaMenswear.xml...[0/155]
importing BaggaMenswear.xml...[done]
backfilling reviews...
backfilling reviews...[done]

I tried a zip version too, and that didn't work (empty file), so I tried an uncompressed xml file:

11:22:01 (13.83 MB/s) - `/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml' saved [87/87]
Content-type: text/html X-Powered-By: PHP/4.3.9
importing BaggaMenswear.xml...[0/0]
importing BaggaMenswear.xml...[done]
backfilling reviews...
backfilling reviews...[done]

Same result - download works but I get an empty file, but the manual method works.

Any suggestions?

Thanks,
Ben

Submitted by dmorison on Thu, 2007-11-01 11:39.

Hi Ben,

Could you post the shell script you have written...(remove any passwords etc., it's the structure I'd need to have a look at)

Cheers,
David.

Submitted by redspan on Thu, 2007-11-01 12:31.

Hi David,

Here is the one for the uncompressed XML feed:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml" "http://content.webgains.com/affiliates/datafeed.html?action=download&campaign=22473&username=xxxx@mydomain&password=xxxxxxx&format=xml&fields=standard&allowedtags=all&programs=1054&categories=all"
#########################
# DECOMPRESS AND DELETE #
#########################
##########
# IMPORT #
##########
/usr/bin/php /var/www/vhosts/wardrobe-workout.co.uk/httpdocs/scripts/import.php @MODIFIED

For the gzip and zip versions changed the extensions and added the appropriate lines under Decompress, for example:

/usr/bin/unzip -p /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.zip > /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml
rm /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.zip

Thanks,
Ben

Submitted by redspan on Thu, 2007-11-01 12:39.

Update: Just tried with a URL from Paidonresults and it worked fine, so I guess it's something to do with Webgains' URL

Submitted by dmorison on Thu, 2007-11-01 13:03.

Hi,

When using the Webgains URL, could you try adding a SPACE on the end of the line after the closing quotes of the URL, as we have come across an issue with this before (search this thread for "space") . What is sometimes happening is that wget is interpretting the line-feed character as part of the URL (even though it occurs outside the closing quotes), so try this:

/usr/bin/wget -O "/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml" "http://content.webgains.com/affiliates/datafeed.html?action=download&campaign=22473&username=xxxx@mydomain&password=xxxxxxx&format=xml&fields=standard&allowedtags=all&programs=1054&categories=all"

Cheers,
David.

Submitted by redspan on Fri, 2007-11-02 17:11.

Hi David,

I tried the space at the end of the URL but that didn't work.

Paid on Results feeds continue to work fine. I'm going to try some other providers next.

Regards,
Ben

Submitted by AD_Mega on Sat, 2007-11-03 18:43.

I've got the file to download and unzip but I have a problem. The file I download changes each week so when I download the file I have to register the feed again. How do I rename the file right after unzipping it so I don't have to keep registering the datafeed?

Submitted by dmorison on Sat, 2007-11-03 18:47.

Hi,

If you're downloading and unzipping using the commands above, you can control the downloaded filename yourself without relying on the remote filename.

For the wget command, the -O parameter controls the output filename, for example:

wget -O merchant.xml "http://www.example.com/feed.asp?aid=1234"

Where you are unzipping, using the "unzip to console" option which most unzip programs have, and then piping the output to the filename of your choice is the way to do it, for example:

unzip -P merchant-2007-10-10.zip > merchant.xml

If you're not sure what to do in your case, if you could post the section of your automation that is downloading the feed i'll take a look for you...

Cheers,
David.

Submitted by AD_Mega on Sat, 2007-11-03 19:09.

Here is the code I use to download the datafeed.

echo ''.date("H:i:s").'';
echo '1-800contacts... 30 seconds';
chdir('/home/megashoppingonline/www/feeds/');
$file='1629676_8006_'.date("Ymd").'.xml.gz';
$url='http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/';

echo "Copying $file..."; flush();
copy ($url.$file,$file);
echo "Unziping $file...";
echo exec("gunzip -f $file");

Submitted by dmorison on Sat, 2007-11-03 19:12.

Hi,

In that code, it's actually your end that is changing the filename every time, so all you need to do is change it to remove the part that adds the date, for example:

echo ''.date("H:i:s").'';
echo '1-800contacts... 30 seconds';
chdir('/home/megashoppingonline/www/feeds/');
$file='1629676_8006.xml.gz';
$url='http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/';
echo "Copying $file..."; flush();
copy ($url.$file,$file);
echo "Unziping $file...";
echo exec("gunzip -f $file");

Hope this helps,
Cheers,
David.

Submitted by AD_Mega on Sat, 2007-11-03 19:17.

cj changes the file name to the current data so to down load the file today I have to login using http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/1629676_8006.20071103.xml.gz

Submitted by dmorison on Sat, 2007-11-03 19:21.

Ah, sorry - I didn't realise it was also added to the URL.

In that case, all you need to do is have a different filename variable for the local file, so i've added $dest to the code - for example:

echo ''.date("H:i:s").'';
echo '1-800contacts... 30 seconds';
chdir('/home/megashoppingonline/www/feeds/');
$file='1629676_8006_'.date("Ymd").'.xml.gz';
$dest='1629676_8006.xml.gz';
$url='http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/';
echo "Copying $file..."; flush();
copy ($url.$file,$dest);
echo "Unziping $dest...";
echo exec("gunzip -f $dest");

Hope this helps,
Cheers,
David.

Submitted by AD_Mega on Sat, 2007-11-03 19:36.

Thank David that worked!

Submitted by atman on Fri, 2008-01-04 20:05.

hi david, :)

ive been quite some months now till i visited the forum. been busy with other projects. :)

I have a CJ datafeed account with multiple merchants all in one massive zip file.
i normally download the file using the format below.
ftp://username:password@datatransfer.cj.com/outgoing/productcatalog/2767/mycollectionofdatafeeds_for_today.zip

is there a modification to the mod above so that can auto download the datafeed directly to my server
auto extract search and auto extract the datafeeds for a specific keyword i predefine
and have all results datafeeds imported to the specific site mysql.

example of one datafeed inside the compressed file is

10421362-NGC_Golf___Fishing_Worldwide.txt

if i predefine the script to import only datafeeds with the "golf" on the merchant file name like the one above it will choose the one above and other feeds with the "golf" keyword and auto import it to the site.

all the sites are using the same header so it can be easy to predefine.

the format is
PROGRAMNAME,PROGRAMURL,LASTUPDATED,NAME,KEYWORDS,DESCRIPTION,SKU,MANUFACTURER,MANUFACTURERID,UPC,ISBN,CURRENCY,SALEPRICE,PRICE,RETAILPRICE,FROMPRICE,BUYURL,IMPRESSIONURL,IMAGEURL,ADVERTISERCATEGORY,THIRDPARTYID,THIRDPARTYCATEGORY,AUTHOR,ARTIST,TITLE,PUBLISHER,LABEL,FORMAT,SPECIAL,GIFT,PROMOTIONALTEXT,STARTDATE,ENDDATE,OFFLINE,ONLINE

the important columns would be
PROGRAMNAME= merchant name
NAME=product name
KEYWORDS=product keywords (can be useful for meta tags, etc)
DESCRIPTION=product description
SKU=unique product code
MANUFACTURER=product manufacturer or brand
PRICE=price
IMAGEURL= image URL
ADVERTISERCATEGORY=main category
THIRDPARTYID=subcategory (mostly blank)
THIRDPARTYCATEGORY=sub-subcategory (mostly blank)
PROMOTIONALTEXT= promotional information if any

i understand that the process might be too complicated. :)

thanks in advance.

Submitted by dmorison on Sat, 2008-01-05 11:29.

Hello atman,

I just did an experiment, and the unzip command supports wildcards in
the (optional) name of files to extract. So if you wanted just the
golf feeds, you could include the following command in your
automation script instead of unzipping all files:

unzip mycollectionofdatafeeds_for_today.zip *Golf*

If this is followed by import.php @MODIFIED, this should do what you want!

The files would have to be registered as normal before you do this, just as
with the normal automation process.

Cheers,
David.

Submitted by Paulzie on Sun, 2008-01-13 11:29.

This thread has been a great help to me. Here's some code I use:-

Affiliate Window Feeds:-

Feedid=317
Filename=ToffsLtd
wget -O "/home/xxxx/xxxx/feeds/Zip/$Filename.xml.zip" "http://datafeeds.productserve.com/datafeed_products.php?user=xxxx&password=xxxx&mid=$Feedid&format=XML&compression=zip&dtd=1.0"
unzip -p /home/xxxxxxx/xxxx/feeds/Zip/$Filename.xml.zip > /home/xxxxx/xxxx/feeds/$Filename.xml
(3 lines of code)

I repeat this code, replacing the Feedid and filename for each feed I need. Each file is downloaded, then extracted to a named file.

For Tradedoubler:-

Filename=24-7Electrical
wget -O "/home/xxxx/xxxx/feeds/Zip/$Filename.xml.zip" "http://pf.tradedoubler.com/export/export?myFeed=xxxx&myFormat=xxxx"
unzip -p /home/xxxx/xxxx/feeds/Zip/$Filename.xml.zip > /home/xxxx/xxxx/feeds/$Filename.xml
(3 lines of code)

I repeat this code, replacing the filename for each feed I need. Wi