You are here:  » Product Feed Download Automation Guide (Linux Servers)


Product Feed Download Automation Guide (Linux Servers)

Submitted by support on Sat, 2006-04-15 09:17 in
Note: This thread is several years old and was created before Price Tapestry had built-in automated
fetch and unzip functionality which is now included in the form of the Automation Tool.

Hi everyone;

As promised; here are some hints / tips / guidelines for automating the download, uncompression and importing of affiliate product feeds from network servers to your Price Tapestry installation. Please note that this level of automation is not a necessary requirement of running the script; you can keep your site up to date simply by downloading feeds manually, and then uploading them to the /feeds/ directory of your Price Tapestry installation via FTP just as you would any other file.

Issues To Be Addressed

Firstly, to appreciate what is required of an automation strategy, it is important to understand the the main problems areas that must be handled:

1) Very few affiliate networks provide clean URLs for their product feeds. This results in 2 related issues:

- i) The format of the URL may have undesired effects when scripted
- ii) The download script (wget) cannot derive a sensible filename for the downloaded output

2) The output may be compressed, although you may be able to control this as part of the feed URL (e.g. on Affiliate Window you can use "&nozip=1" on the end - but it's best to leave downloads compressed if possible as it saves everyone bandwidth!)

3) If the output is compressed; the embedded filename within the archive may be incompatible with your file system; or with your current permissions at the current working directory within the file system (i.e. if subdirectories are implied as part of the filename).

System Requirements for Automation

In order to setup an automation strategy using these guidelines, the following will be required:

1) To test that you have access to the necessary programs you will require shell access to your server / web hosting account, either via Telnet, or more commonly these days via SSH. A popular SSH client for Windows is "Putty", which you can get from here: http://www.chiark.greenend.org.uk/~sgtatham/putty/

2) The ability to execute PHP from the "command line". You can test this having logged in simply by typing "php -v", which will display the PHP version:


$php -v
4.2.2
$

(note: In this and all other example on this page the $ symbol indicates the command prompt, and should not be entered as part of the command)

3) You also need to know the path to php; as when the automation commands are executed via cron the directory containing the php binary may not be in the PATH environment variable and will need to be provided in full. The first thing to do is to "guess" where it might be, and 9 out of 10 times the following will work:


$/usr/bin/php -v
4.2.2
$

If that works, it means that php is installed in the /usr/bin/ directory. If not, you might be able to locate it using the following command:


$locate -r /php$
/path/to/php
$

4) For automating feed retrieval, you will need access to the wget command, and again you will need to know where it is installed on your server:


$locate -r /wget$
/path/to/wget
$

5) For uncompression of feeds using unzip or gzip you will need access to these commands and again you will need to know where they are installed on your server:


$locate -r /unzip$
/path/to/unzip
$


$locate -r /gzip$
/path/to/gzip
$

If you don't have access to the "locate" command, or locate indicates that it has no database to work with and will not be able to find any files, and you are not able to locate the commands by guessing your host should be able to provide the necessary information.

It is possible that you do not need to specify a path; as the process that executes your cron job may be able to find the commands without a path - so it's worth trying anyway.

6) A mechanism to execute multiple commands via cron. There are 2 options:

- i) Create a single "shell script" contining all the commands and then create a cron entry to execute that script

- ii) Create a single cron entry to call each required command with semi-colon separation between each command

In theory it would be possible to create individual cron entries for each command; but they would have to be timed sufficiently far apart as to be sure that the commands were processed in the correct sequence.... so not recommended!

My preference is to use option i); as this lets you edit and work on your automation script on your local computer; and then simply upload it to your server whenever you have made changes.

7) Having created an uploaded the script, you will need to know the full path to that script in order to setup your cron job. For example, if your automation script is called "fetch.sh", the full path may be something like:


/home/webusers/example/fetch.sh

8) Finally, you will need to know the full path to your Price Tapestry installation directory (which may be your web root directory). This might be something like:


/home/webusers/example/public_html/

Summary Of Programs and Paths That You Need To Know

PHP:
/path/to/php (e.g. /usr/bin/php)

WGET:
/path/to/wget (e.g. /usr/bin/wget)

UNZIP (optional, if required):
/path/to/unzip (e.g. /usr/bin/unzip)

GZIP (optional if required):
/path/to/gzip (e.g. /usr/bin/gzip)

Path to the directory where you are planning to upload fetch.sh:
/path/to/fetch.sh (e.g. /home/webusers/example/fetch.sh)

Path to your Price Tapestry installation directory:
/path/to/pt/ (e.g. /home/webusers/example/public_html)

Note: The path to Price Tapestry will be used in the following example script to locate the feeds directory (/path/to/pt/feeds) and the scripts directory (/path/to/pt/scripts).

Creating fetch.sh

So, armed with the above, you can now create an automation script.

There are 4 sections to the script:

1) The first line, which indicates which command interpreter to use, this is normally:


#!/bin/sh

2) Calls to wget to fetch each feed

The key components in constructing a call to "wget" to fetch a feed are:

i) Use the -O parameter to set the output file. I would recommend a simple filename that is the merchant name, and either .xml (for XML feeds) or .txt (for text files). If you know that the download will be compressed, use the same base filename, but append .zip (for zip compressions) or ".gz" for GZIP compression.

ii) Put the feed URL in quotes

The following entries would download 3 feeds, one that is not compressed, 1 compressed using zip, and the third using gzip compression.


/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "http://www.example.com/feeds/getfeed.php?id=1234"
/path/to/wget -O "/path/to/pt/feeds/merchantB.xml.zip" "http://www.example.org/download.asp?id=5678"
/path/to/wget -O "/path/to/pt/feeds/merchantC.txt.gz" "http://feeds.example.net/get.php?merchantID=98765"

3) [optional] Calls to decompression programs if required

In this example, 2 of our feeds are compressed, so we must decompress them. Because you do not know (and do not care) what embedded filename the feed has, the easiest thing to do is decompress to "stdout" and pipe the uncompressed output into the required filename:


/path/to/unzip -p /path/to/pt/feeds/merchantB.xml.zip > /path/to/pt/feeds/merchantB.xml
rm /path/to/pt/feeds/merchantB.xml.zip
/path/to/gzip -c -d -S "" /path/to/pt/feeds/merchantC.txt.gz > /path/to/pt/feeds/merchantC.txt
rm /path/to/pt/feeds/merchantC.txt.gz

4) Call to the Price Tapestry import.php script:

Finally, having downloaded, unzipped and deleted the zipped versions, we can call Price Tapestry's import automation script:


/path/to/php /path/to/pt/scripts/import.php @MODIFIED

Example fetch.sh

The following complete script shows the above fragments combined together into the complete script, just remember to replace "/path/to" with appropriate real-life values for your server:

fetch.sh

#!/bin/sh

#########
# FETCH #
#########
/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "http://www.example.com/feeds/getfeed.php?id=1234"
/path/to/wget -O "/path/to/pt/feeds/merchantB.xml.zip" "http://www.example.org/download.asp?id=5678"
/path/to/wget -O "/path/to/pt/feeds/merchantC.txt.gz" "http://feeds.example.com/get.php?merchantID=98765"

#########################
# DECOMPRESS AND DELETE #
#########################
/path/to/unzip -p /path/to/pt/feeds/merchantB.xml.zip > /path/to/pt/feeds/merchantB.xml
rm /path/to/pt/feeds/merchantB.xml.zip

/path/to/gzip -c -d -S "" /path/to/pt/feeds/merchantC.txt.gz > /path/to/pt/feeds/merchantC.txt
rm /path/to/pt/feeds/merchantC.txt.gz

##########
# IMPORT #
##########
/path/to/php /path/to/pt/scripts/import.php @MODIFIED

To finalise;

1) Upload your automation script to your server as /path/to/fetch.sh

2) You may also need to mark the script as executable. You can either do this via SSH as follows:


$cd /path/to/
chmod +x fetch.sh

Alternatively, you should be able to mark the file as executable via FTP. Most FTP clients have a mechanism to view properties or set permissions for files (on the server). On the properties / permission panel you should see a checkbox or similar mechanism to mark the file as executable.

3) All that's left is to setup fetch.sh to be executed via cron. Most web hosting accounts let you setup cron jobs via your control panel. Simply create a new job, to be executed at the required frequency (e.g daily) and choose an appropriate time. The command to execute is simple:


/path/to/fetch.sh

I know it looks like quite a lot to take in, but it's really mostly about the knowledge of directories on your server and where the programs are; after that it's just a case of creating fetch.sh, uploading, mark as executable, and schedule via cron. Hope this helps!

Submitted by PriceSpin on Sat, 2006-04-15 16:22

David I get this when trying to test it at command line...

: bad interpreter: No such file or directory/bin/sh

Robert

Submitted by support on Sat, 2006-04-15 16:24

First thing is to try without any interpreter. If the script is marked as executable it may work; but i'm not sure if the same applies when invoked via cron.

You could also try to locate the "sh" binary as per the other commands:

$locate -r /sh$
/bin/sh
$

...although it won't show /bin/sh if that is not valid on your server, but if it does show up then replace the first line of your script with the appropriate path.

Submitted by PriceSpin on Sat, 2006-04-15 16:32

it appears to have worked at command line without the first line... will attempt a cron...

Dont think tradedoubler likes to work with this method...

Submitted by support on Sat, 2006-04-15 16:42

What specific problem are you having with TD feeds? - I've had them working OK with wget before. They're gzipped by default I think...

Submitted by PriceSpin on Sat, 2006-04-15 16:44

yeah, they are xml.gz files but everytime it trys to download it in command line it says error 404 file not found even tho I pasted the exact URL given by TD into the fetch.sh file.

** also the fetch.sh without the first line works on cron as well **

Submitted by support on Sat, 2006-04-15 16:47

Excellent!

Re: Tradedoubler; I just had another go to check, and it worked OK. I used this command:

wget -O "currys.xml.gz" "http://pf.tradedoubler.com/unlimited/unlimited_pf_xxx_yyyyyyy.xml.gz"

Send me an email with the actual URL you're trying to use if you like and i'll give it a go...

Submitted by support on Sat, 2006-04-15 16:58

Hmmm - your TD URL worked fine on my server... let me have a think about this, i'll try and come up with some ideas...!

Submitted by PriceSpin on Sat, 2006-04-15 16:59

ok, it works when running wget url but not when cron/command line is doing it, it says:
Resolving pf.tradedoubler.com... 217.212.240.174
Connecting to pf.tradedoubler.com|217.212.240.174|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
18:01:18 ERROR 404: Not Found.

also, anyone have any ideas about buy.at??? 8000 product feed which at around 40% downloaded manually the size is 4.5mb but when downloading through the automated download is only 44k fully downloaded...

Submitted by support on Sat, 2006-04-15 17:09

It must be the same issue - something is upsetting wget when it is being run from within another script, but I can't think what that might be - especially seeing as the TradeDoubler URLs are clean - they should work even without the quotes.

Can you try a controlled experiment to try and find out what is going on. There will be some feeds on your own server in your Price Tapestry /feeds/ directory. Try and automate the downloading of one of those as a test file, so the command would be:

wget -O "test.xml" "http://www.yoursite.com/feeds/somefeed.xml"

Try that in both modes (i.e. wget on its own, and then from inside another script); and see if you get similar problems...

Submitted by PriceSpin on Sat, 2006-04-15 17:20

ok, wget works but doesnt inside the fetch.sh... althought blueshoots.xml from webgains does work inside fetch.sh

Submitted by support on Sat, 2006-04-15 17:37

Can you the post the 3 URLs you are using, masking out any IDs etc:

TD (not working)
buy.at (not working)
Webgains (working)

It might just give an insight into what might be happening...

Submitted by PriceSpin on Mon, 2006-04-17 03:00

ok, before I post urls I had a more indepth look... and it appears that buy.at needs some things added to the URLs so I will look at that.

As for TD... it appears to be adding %0D to the end of the url even tho the full line appears as:
/usr/bin/wget -O /home/pricespi/public_html/feeds/unlimited_pf_xxx_xxxxxxx.xml.gz http://pf.tradedoubler.com/unlimited/unlimited_pf_xxx_xxxxxxx.xml.gz

---

--03:54:54-- http://pf.tradedoubler.com/unlimited/unlimited_pf_xxx_xxxxxxx.xml.gz%0D
=> `/home/pricespi/public_html/feeds/unlimited_pf_xxx_xxxxxxx.xml.gz'
Resolving pf.tradedoubler.com... 217.212.240.174
Connecting to pf.tradedoubler.com|217.212.240.174|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
03:54:54 ERROR 404: Not Found.

Submitted by support on Mon, 2006-04-17 05:36

Hiya,

The URL (and your local file) must be in quotes for wget to work reliably in a shell script.

wget -O "/path/to/file.xml.gz" "http://pf.tradedoubler.com/foo/pf.xml.gz"

What's happening is the carriage return from the Windows style end of line marker is being passed to wget, which assumes that it is part of the URL. You could save the file using Unix style end of lines (if your text editor supports it, but as quotes are an essential component to making this work it's probably not worth looking at doing that.

Submitted by PriceSpin on Mon, 2006-04-17 11:34

Hello David,
I only tried removing the quotes for the last attempt... if it helps, Im using Zend Development Editor to create the .sh file... would it be adding the extra characters??

Submitted by support on Mon, 2006-04-17 17:00

I wonder if it's a character encoding issue?

Can you control the character-set that the ZDE saves the file in? If there is an option for basic 8-bit ASCII (perhaps US-ASCII) you might try that...

Submitted by PriceSpin on Mon, 2006-04-17 18:30

tried changing the encoding but it doesnt appear to have helped...

Submitted by support on Mon, 2006-04-17 18:40

Would you be able to email me the URLs that you're trying to use, and i'll create and test a script on my server then send you the script back as an attachment....

Cheers,
David.

Submitted by crounauer on Mon, 2006-05-08 12:07

I would just like to add that the current stable version of Putty (0.58), which can be found at http://www.chiark.greenend.org.uk/~sgtatham/putty/ will not run if you don't have service pack 2 installed and are running windows XP.

The developers know about the problem and are working on a fix as we speak!

The work around is to install the "current development snapshot". This is not always recommended as it is not considered stable and may still contain bugs - Use at your own risk!

Regards,
Simon.

Computer Hardware

Submitted by crounauer on Tue, 2006-05-09 11:44

Hi David,

Is it ok to leave the /bin/ folder writeable for placing the Shell script in?

Computer Hardware

Submitted by crounauer on Tue, 2006-05-09 11:49

Sorry David,

Please ignore the last post. Just figured out that the fetch.sh can be put in any directory and doesn't go in the /bin/ folder!

Simon

Computer Hardware

Submitted by crounauer on Tue, 2006-05-09 12:10

Hi David,

In your sample fetch.sh script above, are the two double quotes in the right place in this snippet of code?

/path/to/gzip -c -d -S "" /path/to/pt/feeds/merchantC.txt.gz > /path/to/pt/feeds/merchantC.txt

Computer Hardware

Submitted by ROYW1000 on Tue, 2006-05-09 12:51

David

I think the best option for me is to leave safe mode on rather than off due to other users on the server.

Getting the files in for both trade doubler and Affiliate window works fine so I am now looking for an unzip script to unzip these files from the cron. Do you have a script that does this?

Also i tried to Import ALL and Import MODIFIED and some of the larger feeds fell over due to no response for more than 30 seconds and errors on the magic parser error. Have you experienced this before?

I will post the error codes tonight as I am no online a the moment.

Roy

Submitted by crounauer on Tue, 2006-05-09 13:11

Hi David / Pricespin,

Did you ever resolve the problem with error 404?

I am having the same issues with TD and also when trying to download a feed from my own site.

`/home/httpd/vhosts/gynogapod.co.uk/subdomains/lingerie/httpdocs/feeds/contessa-test.xml'
Resolving lingerie.gynogapod.co.uk... 212.100.243.205 Connecting to lingerie.gynogapod.co.uk|212.100.243.205|:80... connected. HTTP request sent, awaiting response... 404 Not Found 14:04:00 ERROR 404: Not Found.

Computer Hardware

Submitted by PriceSpin on Tue, 2006-05-09 23:40

I didnt get it resolved yet... tried everything my end with no success.

Robert
PriceSpin - Price Comparison
DJHangout - DJ Equipment Comparison

Submitted by support on Wed, 2006-05-10 08:34

Hi Simon,

The -S "" is correct - it means that the .gz extension is not assumed.

In your script that is getting a 404; firstly, make sure that you have quotes around your URL as this is mega-critical in all of this. Secondly, as I notice that you are having the same error on your server; could you perhaps have a look at your log and see what was actually requested Vs what you think should have been requested. You should be able to see the 404 in the log and this might give some clues...

Submitted by crounauer on Wed, 2006-05-10 09:20

Hi David,

This is from my log file...

212.100.243.205 - - [09/May/2006:14:24:00 +0100] "GET /feeds/contessa.xml%0D HTTP/1.0" 404 4203 "-" "Wget/1.10.2 (Red Hat modified)"
212.100.243.205 - - [09/May/2006:14:25:00 +0100] "GET /feeds/contessa.xml%0D HTTP/1.0" 404 4203 "-" "Wget/1.10.2 (Red Hat modified)"

Im just going to double check my fetch.sh code as it seems to be adding the %0D on at the end.

Computer Hardware

Submitted by crounauer on Wed, 2006-05-10 09:38

amd this is my fetch.sh code...

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/httpd/vhosts/xxxxxxxxxx/httpdocs/feeds/contessa-test.xml" "http://lingerie.gynogapod.co.uk/feeds/contessa.xml"

Computer Hardware

Submitted by support on Wed, 2006-05-10 09:53

Well you learn something every day!

What it seems is happening, is that the quotes are not all-encompassing. Characters after the closing quote are still included on the URL, up until the UNIX end of line, or the space character.

Now, because the %0D character is part of the Windows (MS-DOS) style end of line style; when the script is run on a Unix box it is used as part of the URL. The solution, I think therefore is to add a SPACE character after the closing quotes, like this:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/httpd/vhosts/xxxxxxxxxx/httpdocs/feeds/contessa-test.xml" "http://lingerie.gynogapod.co.uk/feeds/contessa.xml"[SPACE]

(where [SPACE] is just a space character)

Now, the worst that should happen is that wget tries to retrieve %0D on its own, but hopefully that error can just be ignored...

Let me know if this works....!

Cheers,
David.

Submitted by crounauer on Wed, 2006-05-10 09:58

I did a debug using sh -x fetch.sh and got this result.

Still seems to be they stay %0D that it is adding at the end.

Also checked for syntax errors using sh -n fetch.sh and this came out clean.

+ /usr/bin/wget -O /home/httpd/vhosts/gynogapod.co.uk/subdomains/lingerie/httpdocs/feeds/contessa-test.xml http://lingerie.gynogapod.co.uk/feeds/contessa.xml
--10:55:19-- http://lingerie.gynogapod.co.uk/feeds/contessa.xml%0D
           => `/home/httpd/vhosts/gynogapod.co.uk/subdomains/lingerie/httpdocs/feeds/contessa-test.xml'
Resolving lingerie.gynogapod.co.uk... 212.100.243.205
Connecting to lingerie.gynogapod.co.uk|212.100.243.205|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
10:55:19 ERROR 404: Not Found.

Computer Hardware

Submitted by support on Wed, 2006-05-10 10:00

Is that even after adding the space? (we might have cross-posted)

Is your text editor stripping end of line white space perhaps?

Submitted by crounauer on Wed, 2006-05-10 10:14

Hi David,

You are quite right, you do learn every day!

I have had success by adding in the space as recommended.

Thanks again for your fanatical support!

+ /usr/bin/wget -O /home/httpd/xxxx/httpdocs/feeds/contessa-test.xml http://lingerie.gynogapod.co.uk/feeds/contessa.xml
--11:08:26-- http://lingerie.gynogapod.co.uk/feeds/contessa.xml
           => `/home/httpd/vhosts/xxx/httpdocs/feeds/contessa-test.xml'
Resolving lingerie.gynogapod.co.uk... 212.100.243.205
Connecting to lingerie.gynogapod.co.uk|212.100.243.205|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 350,537 (342K) [text/xml]
100%[====================================>] 350,537 --.--K/s
11:08:26 (82.42 MB/s) - `/home/httpd/vhosts/xxxxxxxxxx/lingerie/httpdocs/feeds/contessa-test.xml' saved [350537/350537]

Computer Hardware

Submitted by crounauer on Wed, 2006-05-10 10:34

Is this a similar problem with spaces?

11:30:00 (1015.00 KB/s) - `/home/httpd/vhosts/xxxxxxx/lingeri e/httpdocs/feeds/figleaves.xml.gz' saved [757993/757993]
--11:30:00-- http://%0D/
           => `/home/httpd/vhosts/xxxxxxxxxxxxxx/lingerie/httpdocs/f eeds/figleaves.xml.gz'
Resolving \015... failed: Name or service not known.

Computer Hardware

Submitted by support on Wed, 2006-05-10 10:37

Sort of - that's what I susspected might happen; because the wget command can fetch a list of URLs all separeted by the space, so what you had previously was:

wget URL[funny character][new line]

you now have...

wget URL [funny character][new line]

...so once it's fetched URL, it fetches "funny character" on its own.

Hopefully it's not too much of an issue and can be ignored...

Alternatively, you might want to see if you text editor can change the end of line mode to "UNIX" which should help...

Submitted by crounauer on Wed, 2006-05-10 11:46

Hi David,

Just an update and for others that might be struggling...

I downloaded a text editor from http://www.flos-freeware.ch/, which as you suggested allowed me to change the line end mode to "UNIX". After removing absolutely all foreign characters (the only one present should be a "LF - line feed" character at the end of each line), uploaded it via ftp and everything ran smoothly.

Without being able to change the line end to "Unix" it was throwing up errors when gzip was being called and when the gzip file tried to be deleted too.

Hope this is of some use to others...

Computer Hardware

Submitted by crounauer on Wed, 2006-05-10 12:05

I also had to change the following file paths to absolute locations in import.php

require("/home/xxxxx/httpdocs/includes/common.php");
require("/home/xxxxx/httpdocs/includes/admin.php");
require("/home/httpd/xxxxxxx/httpdocs/includes/filter.php");
require("/home/httpd/xxxxxxxxxxxx/httpdocs/includes/MagicParser.php");

and these two in includes/common.php

require("/home/httpd/xxxxxxx/httpdocs/config.php");
$common_path = "/home/httpd/xxxxxxxxxxxxxx/httpdocs/includes/";

This I think is because the commands are being called form root, so it needs absolute locations.

Computer Hardware

Submitted by support on Wed, 2006-05-10 19:37

Ah yes. An alternative to hard coding the paths is to get your script to change directory into the same directory as import.php before calling it:

cd /path/to/pt/scripts
/path/to/php /path/to/pt/scripts/import.php @MODIFIED

Submitted by PriceSpin on Tue, 2006-05-16 20:59

David,
Can I send you my fetch.sh to look at? I used the one you sent me and added the space at the end and it seems to work fine except Im getting alot of:
Resolving pf.tradedoubler.com... 217.212.240.174
Connecting to pf.tradedoubler.com|217.212.240.174|:80... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
21:53:28 ERROR 503: Service Temporarily Unavailable.

but it is only on some and then I also get:
gzip: incorrect suffix '(null)'
rm: cannot lstat `/home/path/to/feeds/whsmith.xml.gz\r': No such file or directory

Dont know if its something to do with my fetch.sh

Robert
PriceSpin - Price Comparison
DJHangout - DJ Equipment Comparison

Submitted by support on Wed, 2006-05-17 04:01

Sure, i'll take a look.

It does sound like it's still to do with the file mode (end of line representation) as "\r" implies carriage-return...

Submitted by PHILDARV on Sun, 2006-07-09 14:37

David - Firstly I want to say what a great script this is and secondly following this thread i was having similar problems to others but found that creating the fetch.sh with a program called edit pad pro it resolved the issues with line breaks etc. Now got a live site Designer Clothes ;-)

Submitted by multiz on Fri, 2006-07-14 18:18

I was having problems with my fetch.sh page. So, here are the steps I had to take to make it executable:

1. You may have to click "Change Permissions" after selecting "fetch.sh".
2. Then chmod or change the permissions to "0777". (Another way to do this would be to check all the check boxes--indicating the permissions are set to "0777".

Hope this helps everyone. It helped me.
PS: This really just depends on your webserver's setup. For example, I'm using Linux/Unix and CPanel X.

Software Price Comparison

Submitted by Scutter on Mon, 2006-10-09 10:01

Hi,
Having problems getting this working :(

The error I get is....

importing cdwow.xml...[0/6340]PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 22
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27
PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 22
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27

Any suggestions?

Thank in advance.

Submitted by support on Mon, 2006-10-09 11:12

Hi,

Is this only happening when import.php is called from within your automation script? If you just run it on its own; called from the command prompt ($) do you see the same warnings:

$php import.php cdwow.xml <return>

I assume that import is working fine from within the administration interface?

Cheers,
David.

Submitted by Scutter on Mon, 2006-10-09 11:18

Works fine from the admin interface

[root@localhost scripts]# php import.php ../feeds/cdwow.xml
Content-type: text/html
X-Powered-By: PHP/4.3.11

backfilling reviews... PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 22
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27
backfilling reviews...[done]

Submitted by support on Mon, 2006-10-09 11:32

That's interesting. Is there any problem with the moderate reviews feature within the admin interface or does that work OK?

Submitted by support on Mon, 2006-10-09 11:34

Ah - another thing to point out - import.php expects the filename to be in the feed directory, so you don't need to specify the relative path - it should be just:

php import.php cdwow.xml

The automation scripts do have to be run from the scripts/ directory however (but that shouldn't make any difference in this case)

Submitted by Scutter on Mon, 2006-10-09 12:30

I can confirm that the admin interface works as it should with me manually able to register and import the cdwow feed.

But from the command line, I get the errors mentioned.

Submitted by support on Mon, 2006-10-09 12:59

Hi,

I've just sent you a "debug" version of database.php that will print out the query that is failing, that should help find out what's going on...

Cheers,
David.

Submitted by Scutter on Tue, 2006-10-10 09:09

Right, thanks for the file, I have uploaded it to try it.
I also downloaded a fresh copy of the feed this time in csv format for testing.

backfilling reviews... [UPDATE `price_products` SET rating='0',reviews='0']
[SELECT product_name,AVG(rating) as rating,COUNT(id) as reviews FROM `price_reviews` WHERE approved <> '0' GROUP BY product_name]
PHP Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 27
PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /home/default/thebargainsforum.com/user/htdocs/pricesearch/includes/database.php on line 32

Submitted by support on Tue, 2006-10-10 09:24

Hi,

That's really strange - there's no reason why those queries should fail. I'm wondering if there is a MySQL permissions problem when running from the command line instead of as an Apache module. Can you try the following from the command prompt ($) in the Price Tapestry installation directory:

$php setup.php

It will print out all the HTML that the script normally generates, but the crucial thing is to see whether any of the DB tests fail, which would indicate that your CLI version of PHP does not have database access..... If you could post the output from the above that would help...

Cheers,
David.

Submitted by Scutter on Tue, 2006-10-10 10:32

I had to strip it down as the comment box thought it was suspicious :)

 <tr align="left">
          <td> <p>Checking database connection...PASS</p><p>Checking database selection...PASS</p><p>Checking database tables...PASS</p><p><small>Setup Completed!</small></p><p><small>Using PHP Version 4.3.11</small></p><p><small>Using MySQL Version 4.1.15-standard</small></p> </tr>
            </table>
          </td>
        </tr>

Submitted by murph on Wed, 2006-10-11 09:43

I have fetch.sh working via command line (sh -x fetch.sh), but the cron isn't. I have cpanel and i have literally just set one up using the path to fetch.sh. Is there something else i should try?

Submitted by Eddie on Tue, 2006-11-14 09:15

David,

I have no cpanel etc as such but an account area where I can set the cron job. However my host (having read this post - or so they say) tell me that:

"Hello Eddie,
You cannot call a .sh script from the browser. Please rename it to .cgi.
Please run the cronjob url in the browser to test it.
I hope it helps."

Is there any reason why fetch.cgi wouldnt work ?

Eddie

Submitted by support on Tue, 2006-11-14 10:29

Hi Eddie,

That is unlikely to work i'm afraid - even if your server is configured to run a shell script when saved as .cgi you will have all sorts of timeout issues.

What happens if you just upload fetch.sh into your site via FTP; and then set the path to the file (you will have to figure out where on the server your web tree is located, you might be able to tell from your FTP client) as the cron job in cPanel... for example:

/home/users/eddie/vhosts/www.example.com/httpdocs/fetch.sh

That is the standard way of doing it; but it does require that you have permission to execute shell scripts on your server...

Cheers,
David.

Submitted by Eddie on Tue, 2006-11-14 11:06

David,

Thanks for this (and all other replies - nice result at the weekend for reading!)

Right... my hosts (which so far have been very reliable) are servage.net.

The 'span' the sites over many boxes blah blah.... as a result the feature are limited on the back end. For instance I have their cpanel set up (which they designed) and not the standard cpanel that you come to expect withy your hosting account!

The only way to set a cron job is shown in the example on www.shopsort.co.uk/cron.jpg

I have set the permissions to both 755 and 77 when testing for fetch.sh via FTP.

So far the hosts have said:

You cannot call a .sh script from the browser. Please rename it to .cgi.
Please run the cronjob url in the browser to test it.

And

As we do not support in scripting, so I am afraid that we can help you much regarding this. So, please check the code of your cronjob script with any experienced script writer. As they are the best person, who can guide you. And reagarding any issue with the server, I am forwarding this ticket to our admin and he will look into your request as the first thing tomorrow morning (Tuesday).
Thank you in advance for your patience.

So I am guessing this isn’t going to work. What do you think ?

Submitted by Eddie on Tue, 2006-11-14 13:26

David,

Still not getting my head round above!

Any ideas ?

Eddie

Submitted by support on Tue, 2006-11-14 13:55

Hi Eddie,

It looks from that screen shot as if you are only allowed to setup cron jobs for PHP or other web based scripts; not something like fetch.sh which is a Unix command line shell script - nothing to do with your web server.

This means that in order to automate your Price Tapestry setup you are going to need to develop PHP methods of downloading files and running the import. Do you still have the code you setup previously...

http://www.pricetapestry.com/node/24

Basically you want to use the same trick...

Hope this helps,
David.

Submitted by Eddie on Tue, 2006-11-14 14:09

David,

Thanks for that. It is all comming back to me.

I havent got the code from before... I will let you know how I get on.

Cheers

Eddie

Submitted by Eddie on Tue, 2006-11-14 15:58

David,

Mu host just got back to me with:

"CGI script can be very hard to debug, so i would recommend the php way.

You can wrap the system commands up the the php system() function.

http://www.php.net/manual/en/function.system.php

Example:

system('/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "http://www.example.com/feeds/getfeed.php?id=1234"');

And for php scripts:
include("/path/to/pt/scripts/import.php");

Please change the paths in the following script to the correct values.

/shopsort.co.uk/cron/fetch.php

I hope it helps."

Will that help or should I just go for your last advice ?

Eddie

Submitted by support on Tue, 2006-11-14 16:03

Worth trying Eddie...

Submitted by Eddie on Tue, 2006-11-14 18:18

David,

Using

added " round the MODIFIED to get it working and it prints out the following:

Warning: set_time_limit(): Cannot set time limit in safe mode in /xxxx/shopsort.co.uk/scripts/import.php on line 3 importing awProductFeed.xml...[0/930] importing awProductFeed.xml...[100/930] importing awProductFeed.xml...[200/930] importing awProductFeed.xml...[300/930] importing awProductFeed.xml...[400/930] importing awProductFeed.xml...[500/930] importing awProductFeed.xml...[600/930] importing awProductFeed.xml...[700/930] importing awProductFeed.xml...[800/930] importing awProductFeed.xml...[900/930] importing awProductFeed.xml...[done] backfilling reviews... backfilling reviews...[done]

why the error and what now ?

Thanks

Eddie

Submitted by support on Tue, 2006-11-14 22:48

Hi Eddie,

It's a PHP warning rather than an error - and is part of PHPs "Safe Mode" operation.

As it happens, you didn't actually reach the time limit, so everything worked despite the warning. In order to not see this warning (and to prevent a longer import from failing in the future) you should see if you can get Safe Mode turned off on your account. Your host should be able to do this for you - if you explain that you are running a script that can take more than 30 seconds, and for this reason uses the set_time_limit() function to disable the timeout, they should be prepared to disable safe mode on your account.

For more information on Safe Mode...
http://uk.php.net/features.safe-mode

Cheers,
David.

Submitted by support on Tue, 2006-11-14 22:48

Hi Eddie,

It's a PHP warning rather than an error - and is part of PHPs "Safe Mode" operation.

As it happens, you didn't actually reach the time limit, so everything worked despite the warning. In order to not see this warning (and to prevent a longer import from failing in the future) you should see if you can get Safe Mode turned off on your account. Your host should be able to do this for you - if you explain that you are running a script that can take more than 30 seconds, and for this reason uses the set_time_limit() function to disable the timeout, they should be prepared to disable safe mode on your account.

For more information on Safe Mode...
http://uk.php.net/features.safe-mode

Cheers,
David.

Submitted by bwhelan1 on Tue, 2007-03-20 21:51

What changes should be made to fetch.sh to get feeds from the Shareasale FTP which does require authentication?

Bill Whelan

Computer Store

Submitted by support on Wed, 2007-03-21 05:39

Hello Bill,

wget supports FTP with authentication. There is a standard method for providing your username and password on the URL, which is as follows:

ftp://username:password@ftp.example.com/path/to/file.xml

Therefore, a line similar to the following in your fetch.sh file should do the trick:

/path/to/wget -O "/path/to/pt/feeds/merchantA.xml" "ftp://username:password@ftp.example.com/feeds/merchantA.xml"

Hope this helps,
Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-03-21 17:38

David,

Using the following code for fetch.sh:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/xxxxxx/public_html/feeds/10851.zip" "ftp://username:password@datafeeds.shareasale.com/10851/10851.zip"
/usr/bin/wget -O "/home/xxxxxx/public_html/feeds/11733.zip" "ftp://username:password@datafeeds.shareasale.com/11733/11733.zip"
#########################
# DECOMPRESS AND DELETE #
#########################
/bin/gunzip -p /home/xxxxxx/public_html/feeds/10851.zip > /home/xxxxxx/public_html/feeds/10851.txt
rm /home/xxxxxx/public_html/feeds/10851.zip
/bin/gunzip -p /home/xxxxxx/public_html/feeds/11733.zip > /home/xxxxxx/public_html/feeds/11733.txt
rm /home/xxxxxx/public_html/feeds/11733.zip
##########
# IMPORT #
##########
/usr/local/bin/php /home/xxxxxx/public_html/scripts/import.php @MODIFIED

I get the follwing results after running it with cron:

--12:17:01-- ftp://username:*password*@datafeeds.shareasale.com/10851/10851.zip
          => `/home/xxxxxx/public_html/feeds/10851.zip'
Resolving datafeeds.shareasale.com... done.
Connecting to datafeeds.shareasale.com[198.65.110.138]:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /10851 ... done.
==> PASV ... done. ==> RETR 10851.zip ... done.
   0K .......... .......... .......... .......... .......... 215.52 KB/s
  50K .......... .......... .......... .......... .......... 694.44 KB/s
 100K .......... .......... .......... .......... .......... 510.20 KB/s
 150K .......... .......... .......... .......... .......... 515.46 KB/s
 200K .......... .......... .......... .......... .......... 632.91 KB/s
 250K .......... .......... .......... .......... .......... 364.96 KB/s
 300K .......... .......... .......... .......... .......... 657.89 KB/s
 350K .......... .......... .......... .......... .......... 462.96 KB/s
 400K .......... .......... .......... .......... .......... 384.62 KB/s
 450K .......... .......... .......... .......... .......... 666.67 KB/s
 500K .......... .......... .......... .......... .......... 446.43 KB/s
 550K .......... .......... .......... .......... .......... 495.05 KB/s
 600K .......... .......... .......... .......... .......... 537.63 KB/s
 650K .......... .......... .......... ...... 497.98 KB/s
12:17:03 (462.84 KB/s) - `/home/xxxxxx/public_html/feeds/10851.zip' saved [703335]
--12:17:03-- ftp://username:*password*@datafeeds.shareasale.com/11733/11733.zip
          => `/home/xxxxxx/public_html/feeds/11733.zip'
Resolving datafeeds.shareasale.com... done.
Connecting to datafeeds.shareasale.com[198.65.110.138]:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /11733 ... done.
==> PASV ... done. ==> RETR 11733.zip ...
No such file `11733.zip'.
/bin/gunzip: invalid option -- p
/bin/gunzip: invalid option -- p
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxx/public_html/scripts/import.php on line 5
Fatal error: main(): Failed opening required '../includes/common.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/xxxxxx/public_html/scripts/import.php on line 5

Any ideas? Can I pay you to get this working?

Bill Whelan

Submitted by support on Wed, 2007-03-21 18:11

Hi Bill,

I notice that you are using gunzip, which doesn't have a -p option. The equivalent option (unzip to console) is -c, so try changing this section as follows:

/bin/gunzip -c /home/xxxxxx/public_html/feeds/10851.zip > /home/xxxxxx/public_html/feeds/10851.txt
rm /home/xxxxxx/public_html/feeds/10851.zip
/bin/gunzip -c /home/xxxxxx/public_html/feeds/11733.zip > /home/xxxxxx/public_html/feeds/11733.txt
rm /home/xxxxxx/public_html/feeds/11733.zip

Hope this helps!
Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-03-21 19:25

David,

Would it be necessary to force an overwrite of the existing files with a -f since the files from the previous upload would still exist in the feeds folder?

Bill Whelan

Computer Store

Submitted by support on Wed, 2007-03-21 19:29

Hi Bill,

That shouldn't be necessary with -c because the unzip is to standard output, which is then being redirected to the appropriate filename via a pipe ">"...

Cheers,
David.

Submitted by bwhelan1 on Wed, 2007-03-21 21:13

David,

I realize that I included a file that was not in my SAS FTP folder. After deleting the lines for that file I get this as an output for the cron job:

--15:45:00-- ftp://username:*password*@datafeeds.shareasale.com/10851/10851.zip
          => `/home/xxxxxxxx/public_html/feeds/10851.zip'
Resolving datafeeds.shareasale.com... done.
Connecting to datafeeds.shareasale.com[198.65.110.138]:21... connected.
Logging in as username ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD /10851 ... done.
==> PASV ... done. ==> RETR 10851.zip ... done.
   0K .......... .......... .......... .......... .......... 218.34 KB/s
  50K .......... .......... .......... .......... .......... 657.89 KB/s
 100K .......... .......... .......... .......... .......... 531.91 KB/s
 150K .......... .......... .......... .......... .......... 400.00 KB/s
 200K .......... .......... .......... .......... .......... 632.91 KB/s
 250K .......... .......... .......... .......... .......... 500.00 KB/s
 300K .......... .......... .......... .......... .......... 549.45 KB/s
 350K .......... .......... .......... .......... .......... 413.22 KB/s
 400K .......... .......... .......... .......... .......... 537.63 KB/s
 450K .......... .......... .......... .......... .......... 632.91 KB/s
 500K .......... .......... .......... .......... .......... 520.83 KB/s
 550K .......... .......... .......... .......... .......... 500.00 KB/s
 600K .......... .......... .......... .......... .......... 362.32 KB/s
 650K .......... .......... .......... ...... 646.50 KB/s
15:45:02 (464.72 KB/s) - `/home/xxxxxxxx/public_html/feeds/10851.zip' saved [703335]
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxxxx/public_html/scripts/import.php on line 5
Warning: main(../includes/common.php): failed to open stream: No such file or directory in /home/xxxxxxxx/public_html/scripts/import.php on line 5
Fatal error: main(): Failed opening required '../includes/common.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/xxxxxxxx/public_html/scripts/import.php on line 5

It seems to upload the feed, unzip it then delete the file just fine but the import fails. If I peel off the code in the script that for file import then run a separate cron job for import it works fine. I should probably just live with that but do you have any ideas why it fails to work when included in fetch.sh?

Bill Whelan

Computer Store

Submitted by bwhelan1 on Thu, 2007-03-22 01:48

David,

It appears to be working with the following:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/home/xxxxxxxx/public_html/feeds/10851.zip" "ftp://username:password@datafeeds.shareasale.com/10851/10851.zip"
#########################
# DECOMPRESS AND DELETE #
#########################
/bin/gunzip -c /home/xxxxxxxx/public_html/feeds/10851.zip > /home/xxxxxxxx/public_html/feeds/10851.txt
rm /home/xxxxxxxx/public_html/feeds/10851.zip
##########
# IMPORT #
##########
php /home/xxxxxxxx/public_html/scripts/import.php @MODIFIED

I just changed this:

/usr/local/bin/php /home/xxxxxxxx/public_html/scripts/import.php @MODIFIED

To this:

php /home/xxxxxxxx/public_html/scripts/import.php @MODIFIED

and now it works. Thanks for all your help.

Bill Whelan

Computer Store

Submitted by SteveC on Fri, 2007-07-06 10:40

Hi David,
Im getting similar problems to the last post - although the reason given by Bill makes no sense to me - he doesnt appear to have changed anything. Anyway heres what im getting:

Warning: main(../includes/common.php): failed to open stream: No such file or directory in /pathToPricetapestry/scripts/import.php on line 5
Fatal error: main(): Failed opening required '../includes/common.php' (include_path='.:') in /pathToPricetapestry/scripts/import.php on line 5

It looks like its looking for /includes/common.php in /sripts/import.php !?

Thanks, Steve.

Submitted by support on Fri, 2007-07-06 10:43

Hi Steve,

This sounds like PHP on your platform is not configured to behave as if it is running from the script directory. This is not a usual configuration, but it can happen.

The solution is to include a change directory command (cd) to the scripts directory in your automation script before calling import.php. For example:

cd /path/to/pricetapestry/scripts
php import.php @MODIFIED

That should help...
Cheers,
David.

Submitted by SteveC on Fri, 2007-07-06 11:25

Thanks David,
The additional cd command worked.

Steve.

Submitted by tbbd2007 on Sun, 2007-08-05 11:40

David

I have done all as directed above, but am experiencing a couple of problems.

They are:
1) Manually running fetch.sh via Putty shows the following 'importing 1st4phonecards.xml...[done]
importing avon.xml...[done]
importing 24electric.xml...[done]
importing 247electrical.xml...[done]
importing 7dayshop.xml...[done]
importing 911_experience.xml...[done]
importing advanced_mp3_players.xml...[done]
importing a_quarter_of.xml...[done]
importing a1_gifts.xml...[done]
importing activity_gifts.xml...[done]
importing abm_opticians.xml...[done]
importing activity_superstore.xml...[done]
importing alaskan_air_conditioners.xml...[done]
importing amalgam_shop.xml...[done]
backfilling reviews...[done]'. When I check the feeds they are all a fraction of what their size should be.
2) PT hasn't imported them at all, the merchant names are still there and registered, but there are no products for them.
3) The directory where fetch.sh resides is filling up with files with names from the download URLs, i.e. there is a file named 'datafeed_products.php?user=xxxxx&password=xxxxx&mid=xxxx&format=xml&dtd=1.2' from the downloaded feed of '{link saved}'.

Initially the shell script wouldn't run, but now having downloaded a trial of a bash shell script editor it does. I have checked and it is uploading as Ascii.

Some help with this would be brilliant as this is far beyond me.

Kind regards

Stephen

Submitted by support on Sun, 2007-08-05 11:59

Hello Stephen,

This sounds like the automation of wget is not working correctly; and therefore the actual .xml files are not the feeds themselves - and it sounds like the long filenames with the feed URL is probably the feed content.

Can you post an example of one of your wget lines (don't forget to remove any sensitive information like your password etc.) and i'll take a look for you...

Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 12:11

David

One of the lines is '/usr/bin/wget -o "/homepages/x/xxxxx/htdocs/xxxxx/xxxxx/feeds/xxxxx.xml" "http://datafeeds.productserve.com/datafeed_products.php?user=xxxxx&password=xxxxx&mid=xxxx&format=xml&dtd=1.2"'.

Yours

Stephen

Submitted by support on Sun, 2007-08-05 12:16

Hi Stephen,

I think you need to use -O (upper case) instead of -o, which means something different.

It should then work fine!

Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 12:27

David

Thanks for that, I had just spotted that myself when I was getting the list of wget commands, 'O' and 'o' are separate commands, silly mistake!

They have all been checked, but most have permission errors.

Thanks for your help.

Stephen

Submitted by support on Sun, 2007-08-05 16:13

Hi Stephen,

Is everything working now, or do you still need help with the permission errors? The main thing of course when setting up an automation script is that whatever user the CRON job is running as has write access to feeds folder... This can sometimes catch people out because it works when they are testing their shell script, but when running via cron you get permission errors...

Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 16:48

David

I'm still getting the same problem, I'm on a shared hosting package and I have tested my password, but it isn't the same as the root. How do I either get round this or find out the root password?

Stephen

Submitted by support on Sun, 2007-08-05 17:38

Hi Stephen,

The usual solution is to make the feeds folder "world writable". This will give any user on your system, including whatever user your cron job is running as, full write access to the folder. Don't worry - it doesn't have any bearing on HTTP access to the folder.

To do this, login to your account, and try using the following commands:

cd path/to/pricetapestry
chmod a+w feeds

This will make the feeds/ directory world writable, and should solve any permission problems.

Hope this helps,
Cheers,
David.

Submitted by tbbd2007 on Sun, 2007-08-05 19:03

David

That does essentially work, apart from receiving error messages from AW, but the messages state I need to apply for permission from them, which I have now done.

This is going to be good I hope!

Thanks again for all your help!

Stephen

Submitted by tbbd2007 on Mon, 2007-08-06 11:04

David

I have got permission from AW now and successfully downloaded their feeds, however every code line in the XML file is filled with 'CDATA' before and after the data and PT won't parse this information. How do I get around that?

Stephen

Submitted by support on Mon, 2007-08-06 11:22

Hi Stephen,

There shouldn't be any problem with Affiliate Windows feeds - I use plenty of them on my sites, and CDATA is a standard XML feature that should be handled correctly (it allows fields to contain things like HTML that would otherwise break the XML).

What are the actual sympoms of the problem you are having? Are the feeds auto-detecting correctly? Can you see the first product record when registering the feed etc.?

If you're not sure where the problem lies, if you could email me a link to one of the feeds directly from your site (i.e. the name of the feed in your /feeds directory and the URL of your site) I will download it and check it out on my test server. Reply to your reg code or forum registration email is the easiest way to get me...

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 10:41

David,
could you suggest a suitable mod that ensures the import does NOT occur if something goes wrong - Ive recently had situation where it looks like feed download part failed but it still overwrote the existing feed with a file size of zero and then tried to import it, resulting in all products being lost.
Id probably like to extend this to write the results of any failure to a log and auto email it to myself.

cheers, Steve.

Submitted by support on Mon, 2007-08-20 10:44

Hi Steve,

Yes - i'll look into this, shouldn't be too hard. Are you using a standard Linux shell script as described above, using WGET to download the feeds?

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 11:09

David,
yes, script above - the only difference is that i have to change to the scripts directory before calling import.php on my system. I'm also using one script per affiliate network and these are set via cron to run on different days so I dont get the problem of an empty database if anything goes wrong! Affiliate window are currently moving thier feeds to a new server so I presume there are others having this same problem!

thanks, Steve.

Submitted by support on Mon, 2007-08-20 11:14

Hi Steve,

The safest, and quickest way to implement this I think is to modify the import function in includes/admin.php to abandon the import if the file size is zero. This will mean you only have to make one change, and your automation scripts can stay as they are.

To add this check, look for the following code on line 312:

global $admin_importCallback;

..and ADD the following block of code immediately AFTER that line:

if (!filesize($filename))
{
  return $filename." empty, aborting";
}

That should do the trick!

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 13:58

David, Perhaps this will work fine on my rented linux box but I develop on windows and get the following error:

Warning: filesize() [function.filesize]: stat failed for feed3.xml in C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\includes\admin.php

If this is a platform issue can you suggest the best way to enable the code to safely run on any platfom?

Steve.

Submitted by support on Mon, 2007-08-20 14:02

Hi Steve,

Ah - i've come across filesize() issues on Windows before. As a quick fix, if you're not likely to have 0 byte feed problems on your dev server, you can supress that warning by changing the code as follows:

if (!@filesize($filename))
{
  return $filename." empty, aborting";
}

(the @ infront of any PHP internal function suppresses any warnings generated by that function)

It also occurred to me that you could use PHP's mail() function to send your alert email, changing the code further still as follows:

if (!@filesize($filename))
{
  mail("your@email.com","Empty File Alert!",$filename);
  return $filename." empty, aborting";
}

This is assuming that mail() is capable of sending email from your server(s)...

Cheers,
David.

Submitted by SteveC on Mon, 2007-08-20 15:05

David,
the warning no longer displays in browser but the entire import still aborts unless I comment out this bit of code (still on my dev server). yes - using the mail function here is ideal!

Steve.

Submitted by support on Mon, 2007-08-20 15:12

Hi Steve,

Ooops - yes that would be a problem. I've looked at the docs for filesize() and it returns FALSE on error, so the following version should trap this...

if ( (!@filesize($filename)) && (filesize($filename)!==FALSE) )
{
  mail("your@email.com","Empty File Alert!",$filename);
  return $filename." empty, aborting";
}

Hope this helps,
Cheers,
David.

Submitted by redspan on Thu, 2007-11-01 11:32

Hello,

I've made some progress with this and I have a Cron job on a linux server that runs the fetch file and it fetches the first test file. In each test I've done the result has been a new file in the right directory with the right name, but the file size is always 0. These tests were done with feeds from Webgains. I've tested all these files manually so I know they work if downloaded in xml format (no compression).

Here are some of the responses the server has sent me. First I tried a gzip version:

08:43:02 (11.85 MB/s) - `/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.txt.gz' saved [87/87]
gzip: /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.txt.gz: not in gzip format
Content-type: text/html X-Powered-By: PHP/4.3.9
importing BaggaMenswear.xml...[0/155]
importing BaggaMenswear.xml...[done]
backfilling reviews...
backfilling reviews...[done]

I tried a zip version too, and that didn't work (empty file), so I tried an uncompressed xml file:

11:22:01 (13.83 MB/s) - `/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml' saved [87/87]
Content-type: text/html X-Powered-By: PHP/4.3.9
importing BaggaMenswear.xml...[0/0]
importing BaggaMenswear.xml...[done]
backfilling reviews...
backfilling reviews...[done]

Same result - download works but I get an empty file, but the manual method works.

Any suggestions?

Thanks,
Ben

Submitted by support on Thu, 2007-11-01 11:39

Hi Ben,

Could you post the shell script you have written...(remove any passwords etc., it's the structure I'd need to have a look at)

Cheers,
David.

Submitted by redspan on Thu, 2007-11-01 12:31

Hi David,

Here is the one for the uncompressed XML feed:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget -O "/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml" "http://content.webgains.com/affiliates/datafeed.html?action=download&campaign=22473&username=xxxx@mydomain&password=xxxxxxx&format=xml&fields=standard&allowedtags=all&programs=1054&categories=all"
#########################
# DECOMPRESS AND DELETE #
#########################
##########
# IMPORT #
##########
/usr/bin/php /var/www/vhosts/wardrobe-workout.co.uk/httpdocs/scripts/import.php @MODIFIED

For the gzip and zip versions changed the extensions and added the appropriate lines under Decompress, for example:

/usr/bin/unzip -p /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.zip > /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml
rm /var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.zip

Thanks,
Ben

Submitted by redspan on Thu, 2007-11-01 12:39

Update: Just tried with a URL from Paidonresults and it worked fine, so I guess it's something to do with Webgains' URL

Submitted by support on Thu, 2007-11-01 13:03

Hi,

When using the Webgains URL, could you try adding a SPACE on the end of the line after the closing quotes of the URL, as we have come across an issue with this before (search this thread for "space") . What is sometimes happening is that wget is interpretting the line-feed character as part of the URL (even though it occurs outside the closing quotes), so try this:

/usr/bin/wget -O "/var/www/vhosts/mydomain.co.uk/httpdocs/feeds/BaggaMenswear.xml" "http://content.webgains.com/affiliates/datafeed.html?action=download&campaign=22473&username=xxxx@mydomain&password=xxxxxxx&format=xml&fields=standard&allowedtags=all&programs=1054&categories=all"

Cheers,
David.

Submitted by redspan on Fri, 2007-11-02 17:11

Hi David,

I tried the space at the end of the URL but that didn't work.

Paid on Results feeds continue to work fine. I'm going to try some other providers next.

Regards,
Ben

Submitted by AD_Mega on Sat, 2007-11-03 18:43

I've got the file to download and unzip but I have a problem. The file I download changes each week so when I download the file I have to register the feed again. How do I rename the file right after unzipping it so I don't have to keep registering the datafeed?

Submitted by support on Sat, 2007-11-03 18:47

Hi,

If you're downloading and unzipping using the commands above, you can control the downloaded filename yourself without relying on the remote filename.

For the wget command, the -O parameter controls the output filename, for example:

wget -O merchant.xml "http://www.example.com/feed.asp?aid=1234"

Where you are unzipping, using the "unzip to console" option which most unzip programs have, and then piping the output to the filename of your choice is the way to do it, for example:

unzip -P merchant-2007-10-10.zip > merchant.xml

If you're not sure what to do in your case, if you could post the section of your automation that is downloading the feed i'll take a look for you...

Cheers,
David.

Submitted by AD_Mega on Sat, 2007-11-03 19:09

Here is the code I use to download the datafeed.

echo ''.date("H:i:s").'';
echo '1-800contacts... 30 seconds';
chdir('/home/megashoppingonline/www/feeds/');
$file='1629676_8006_'.date("Ymd").'.xml.gz';
$url='http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/';

echo "Copying $file..."; flush();
copy ($url.$file,$file);
echo "Unziping $file...";
echo exec("gunzip -f $file");

Submitted by support on Sat, 2007-11-03 19:12

Hi,

In that code, it's actually your end that is changing the filename every time, so all you need to do is change it to remove the part that adds the date, for example:

echo ''.date("H:i:s").'';
echo '1-800contacts... 30 seconds';
chdir('/home/megashoppingonline/www/feeds/');
$file='1629676_8006.xml.gz';
$url='http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/';
echo "Copying $file..."; flush();
copy ($url.$file,$file);
echo "Unziping $file...";
echo exec("gunzip -f $file");

Hope this helps,
Cheers,
David.

Submitted by AD_Mega on Sat, 2007-11-03 19:17

cj changes the file name to the current data so to down load the file today I have to login using http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/1629676_8006.20071103.xml.gz

Submitted by support on Sat, 2007-11-03 19:21

Ah, sorry - I didn't realise it was also added to the URL.

In that case, all you need to do is have a different filename variable for the local file, so i've added $dest to the code - for example:

echo ''.date("H:i:s").'';
echo '1-800contacts... 30 seconds';
chdir('/home/megashoppingonline/www/feeds/');
$file='1629676_8006_'.date("Ymd").'.xml.gz';
$dest='1629676_8006.xml.gz';
$url='http://username:password@datatransfer.cj.com/datatransfer/files/1629676/outgoing/productcatalog/8006/';
echo "Copying $file..."; flush();
copy ($url.$file,$dest);
echo "Unziping $dest...";
echo exec("gunzip -f $dest");

Hope this helps,
Cheers,
David.

Submitted by AD_Mega on Sat, 2007-11-03 19:36

Thank David that worked!

Submitted by atman on Fri, 2008-01-04 20:05

hi david, :)

ive been quite some months now till i visited the forum. been busy with other projects. :)

I have a CJ datafeed account with multiple merchants all in one massive zip file.
i normally download the file using the format below.
ftp://username:password@datatransfer.cj.com/outgoing/productcatalog/2767/mycollectionofdatafeeds_for_today.zip

is there a modification to the mod above so that can auto download the datafeed directly to my server
auto extract search and auto extract the datafeeds for a specific keyword i predefine
and have all results datafeeds imported to the specific site mysql.

example of one datafeed inside the compressed file is

10421362-NGC_Golf___Fishing_Worldwide.txt

if i predefine the script to import only datafeeds with the "golf" on the merchant file name like the one above it will choose the one above and other feeds with the "golf" keyword and auto import it to the site.

all the sites are using the same header so it can be easy to predefine.

the format is {content saved}

the important columns would be
PROGRAMNAME= merchant name
NAME=product name
KEYWORDS=product keywords (can be useful for meta tags, etc)
DESCRIPTION=product description
SKU=unique product code
MANUFACTURER=product manufacturer or brand
PRICE=price
IMAGEURL= image URL
ADVERTISERCATEGORY=main category
THIRDPARTYID=subcategory (mostly blank)
THIRDPARTYCATEGORY=sub-subcategory (mostly blank)
PROMOTIONALTEXT= promotional information if any

i understand that the process might be too complicated. :)

thanks in advance.

Submitted by support on Sat, 2008-01-05 11:29

Hello atman,

I just did an experiment, and the unzip command supports wildcards in
the (optional) name of files to extract. So if you wanted just the
golf feeds, you could include the following command in your
automation script instead of unzipping all files:

unzip mycollectionofdatafeeds_for_today.zip *Golf*

If this is followed by import.php @MODIFIED, this should do what you want!

The files would have to be registered as normal before you do this, just as
with the normal automation process.

Cheers,
David.

Submitted by Paulzie on Sun, 2008-01-13 11:29

This thread has been a great help to me. Here's some code I use:-

Affiliate Window Feeds:-

Feedid=317
Filename=ToffsLtd
wget -O "/home/xxxx/xxxx/feeds/Zip/$Filename.xml.zip" "http://datafeeds.productserve.com/datafeed_products.php?user=xxxx&password=xxxx&mid=$Feedid&format=XML&compression=zip&dtd=1.0"
unzip -p /home/xxxxxxx/xxxx/feeds/Zip/$Filename.xml.zip > /home/xxxxx/xxxx/feeds/$Filename.xml
(3 lines of code)

I repeat this code, replacing the Feedid and filename for each feed I need. Each file is downloaded, then extracted to a named file.

For Tradedoubler:-

Filename=24-7Electrical
wget -O "/home/xxxx/xxxx/feeds/Zip/$Filename.xml.zip" "http://pf.tradedoubler.com/export/export?myFeed=xxxx&myFormat=xxxx"
unzip -p /home/xxxx/xxxx/feeds/Zip/$Filename.xml.zip > /home/xxxx/xxxx/feeds/$Filename.xml
(3 lines of code)

I repeat this code, replacing the filename for each feed I need. With this one I also paste in each unique TD download link at the end of the WGET line

For Comission Junction

TODAY=$(date +"%Y%m%d")
/usr/bin/wget -O "/home/xxx/xxxx/feeds/Zip/cjfeeds.zip" "ftp://xxxx@datatransfer.cj.com/outgoing/productcatalog/xxxx/xxxxxxxxxxxx_$TODAY.zip"
/usr/bin/unzip -o /home/xxxx/xxxx/feeds/Zip/cjfeeds.zip -d /home/xxxx/xxxx/feeds
(3 lines of code)

(Thanks to another user who posted the top bit of the CJ code elsewhere on the forum)

As the CJ zip file contains multiple XML files, I simple extract the CJ file with no renaming. CJ XML files stay the same name.

I then run the following command to delete any XML files that have failed to download correctly. It deletes any XML files that are less than 1k.

find /home/xxxx/xxxx/feeds -name '*.xml' -size -1000c -exec rm {} \;

Submitted by clare on Mon, 2008-02-11 17:15

Hi,
Is it possible, as I use each feed in two places on my site to do the following from the one fetch file, so that I only have to import each feed once, instead of once for each PT installation.

Having imported a feed and unzipped it, it could save it in 2 places and then import from each, to the 2 PT installations.

In the decompress and delete section of fetch.sh

Having imported the zip file, would I just do the next bit twice, pointing to different places such as...

/usr/bin/unzip -p /home/user/public_html/shopUK/FIRSTPLACE/feeds/merchant.txt.zip > /home/user/public_html/shopUK/FIRSTPLACE/feeds/merchant.txt
/usr/bin/unzip -p /home/user/public_html/shopUK/FIRSTPLACE/feeds/merchant.txt.zip > /home/user/public_html/shopUK/SECONDPLACE/feeds/merchant.txt
rm /home/user/public_html/shopUK/FIRSTPLACE/feeds/merchant.txt.zip

And then later in the import section, state the import code twice as well...

/usr/bin/php /home/user/public_html/shopUK/FIRSTPLACE/scripts/import.php @MODIFIED
/usr/bin/php /home/user/public_html/shopUK/SECONDPLACE/scripts/import.php @MODIFIED

Would that work or is there a different way to save that unzipped file to 2 places and import from both in the fetch.sh file?

Submitted by support on Mon, 2008-02-11 17:19

Hi Clare,

That should work fine - no problem at all.

Cheers,
David.

Submitted by clare on Mon, 2008-02-11 17:53

Thanks very much

Submitted by clare on Mon, 2008-03-31 13:18

Hi,

If I wanted to join some files, due to some datafeeds being large and coming in 2 parts, is there a command I can write in my script that would join some text files.

I did find a join linux command that looks like it might do it, but wanted to check with you if this is usable and how to write it in the script, is it similar to wget and unzip so...

usr/bin/join part1merchant.txt part2merchant.txt > merchant.txt

Would this work?

Submitted by support on Tue, 2008-04-01 10:54

Hi Clare,

That's exactly right - it should work fine. If you find it doesn't, check that the script is running in the directory containing those files, perhaps using a "cd" (change directory) command on the line before.

Cheers,
David.

Submitted by jonny5 on Tue, 2008-04-01 12:17

could someone post a buy.at download link for any feed for me to see what it should look like (one that someone is using and they know works , and also minus your affili details of cause)
just my buy.at feeds when downloaded from the fetch script contain nothing , they did work but dont now , and im sure i didn't change anything

Submitted by support on Tue, 2008-04-01 12:26

I Jonny,

I just tried the M&S feed using the following link and it worked ok...

http://b1.perfb.com/productfeed_download_v3.php?EMAIL=xxxxxxx&PX=FEED-1a88261074edf0e59dedfb7ee84c60b2&FORMAT=XML&GZIP=yes&PRODUCTDB_ID=1

Cheers,
David.

Submitted by jonny5 on Tue, 2008-04-01 12:34

cheers david,

i can see straight away whats wrong , my link looks nothing like that!! have now changed them

cheers

Submitted by paddyman on Fri, 2008-07-25 10:34

Hi David,

Am able to run fetch.sh from the command line and it downloads and imports the feeds perfectly. Having problems with the cron though.

Command in my cron job is

/usr/bin/php /home/domain1/public_html/fetch.sh

and when run it sent an email with all of the code in fetch.sh some of which is..

X-Powered-By: PHP/5.2.5
Content-type: text/html

#########
# FETCH #
#########
Feedid=980
Filename=ajelectronics
wget -O "/home/domain1/public_html/feeds/$Filename.xml.zip" "http://datafeeds.productserve.com/datafeed_products.php?user=userid&password=password&mid=$Feedid&format=XML&compression=zip&dtd=1.0"
unzip -p /home/domain1/public_html/feeds/$Filename.xml.zip > /home/domain1/public_html/feeds/$Filename.xml
rm /home/domain1/public_html/feeds/$Filename.xml.zip

Feedid=1597
Filename=dixons
wget -O "/home/domain1/public_html/feeds/$Filename.xml.zip" "http://datafeeds.productserve.com/datafeed_products.php?user=userid&password=password&mid=$Feedid&format=XML&compression=zip&dtd=1.0"
unzip -p /home/domain1/public_html/feeds/$Filename.xml.zip > /home/domain1/public_html/feeds/$Filename.xml
rm /home/domain1/public_html/feeds/$Filename.xml.zip

##########
# IMPORT #
##########
/usr/bin/php /home/domain1/public_html/scripts/import.php @ALL

Doesn't download or import anything, just emails me the code above.

Any ideas?

Thanks

Adrian

Submitted by support on Fri, 2008-07-25 10:38

Hi Adrian,

The .sh file is a shell script, not PHP - so rather than using:

/usr/bin/php /home/domain1/public_html/fetch.sh

...it should just be:

/home/domain1/public_html/fetch.sh

Make sure that fetch.sh is marked as executable of course, i.e.:

chmod +x fetch.sh

That should be all it is...

Cheers,
David.

Submitted by paddyman on Fri, 2008-07-25 11:16

Excellent,

Working great now.

Thanks :)

Adrian

Submitted by jonny5 on Thu, 2008-08-07 12:02

Hi David,

I now have my compressed automation download working (thanks to the reply to the email this morning)
i added the quotes and the space at the end and it all worked.
the problem i have now is on the server i have several datafeed files from where i was trying to get the script to work , these were downloaded and i guess there is an error in them or something , i dont need them any more but i am unable to delete them? in FTP i right clickl them and select delete but they just stay there.

any chance you might know how i can delete them?

many thanks

Submitted by support on Thu, 2008-08-07 15:07

Hi,

This will be a permissions problem; it sounds like FTP is connected as a different user to that which you are running (or CRON is running) the automation script. Once it's up and running, log in to the command line from where you should be able to delete them; and from then on, once only one user is ever creating the files, if they are deleted in the same script it should work fine...

Cheers,
David.

Submitted by MartinM on Wed, 2008-08-13 07:50

Morning

I seem to have *most* of the automatic import process running OK but the import.php script doesn't seem to be playing ball.

My fetch.sh is:

#!/bin/sh
#########
# FETCH #
#########
/usr/bin/wget-protect -O "/home/###/public_html/###/feeds/online-###.zip" "http://datafeeds.productserve.com/xmlproductoutput.php?mid=###&user=####&password=####"
#########################
# DECOMPRESS AND DELETE #
#########################
/usr/bin/unzip -p /home/###/public_html/###/feeds/online-###.zip > /home/###/public_html/###/feeds/online-###.csv
rm /home/###/public_html/###/feeds/online-###.zip
##########
# IMPORT #
##########
/usr/bin/php /home/###/public_html/###/scripts/import.php @MODIFIED

The script works in as much as the datafeed is received, unzipped and the zip deleted - the datestamp in admin shows that the file has been accepted. But the import.php script doesn't seem to want to import...the cron results are as below...

=> `/home/###/public_html/###/feeds/online-###.zip'
Resolving datafeeds.productserve.com... 62.216.237.16
Connecting to datafeeds.productserve.com|62.216.237.16|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 80,285 (78K) [application/zip]

0K .......... .......... .......... .......... .......... 63% 6.65 MB/s
50K .......... .......... ........ 100% 7.89 MB/s

08:46:01 (7.05 MB/s) -
`/home/###/public_html/###/feeds/online-###.zip' saved
[80285/80285]

There doesn't seem to be any output for the import - should it say anything if there's an error? As a test I changed the location of php in fetch.sh and it showed that the command can't be found so it's getting to the import command but it's not performing it (or bailing out, of course). On the admin the datestamp doesn't say that an import has taken place, either.

Any suggestions?

Thanks

Martin

Submitted by support on Wed, 2008-08-13 08:12

Hello Martin,

Occasionally, servers are configured such that PHP does not execute a script as if it was running in the directory in which the file is stored, which may be the case here.

Could you try adding a cd command prior to the import.php; so you would have this:

cd /home/###/public_html/###/scripts/
/usr/bin/php import.php @MODIFIED

Hope this helps,

Cheers,
David.

Submitted by MartinM on Wed, 2008-08-13 08:19

Thanks, David, but I'm afraid it didn't help - exactly the same cron results and the Imported time doesn't change. The command I used is...

cd /home/***/public_html/***/scripts/ /usr/bin/php import.php @MODIFIED

M.

Submitted by MartinM on Wed, 2008-08-13 08:22

My mistake, David - I thought the command had wrapped that you gave me but when I did it as pasted by you, works fine.

Thanks!

Submitted by AD_Mega on Sat, 2008-11-01 21:12

My script to download files from Linkshare worked on my old host but isn't working here on my dedicated server. I Get There was an error downloading the file. Here is the script I use to download the feeds.

<?php
set_time_limit
(3600);
str_repeat(' ',250);
chdir('/var/www/vhosts/mydomainname.com/httpdocs/feeds/');
$afile = array( array('/1110'  ,'1110_1646303_mp.xml.gz'  ),
                array(
'/1145'  ,'1145_1646303_mp.xml.gz'  ),
                array(
'/1805' ,'1805_1646303_mp.xml.gz' ),
                array(
'/2208' ,'2208_1646303_mp.xml.gz' ),
                array(
'/2119' ,'2119_1646303_mp.xml.gz' ),
                array(
'/13892' ,'13892_1646303_mp.xml.gz' ));
 echo 
'<br><br> Start '.date("H:i:s").'<br>';
 echo 
'<br><br><B>LinkShare</B>... 1.5 hours'flush();
foreach (
$afile as $lfile) {
 
chdir('/var/www/vhosts/mydomainname.com/httpdocs/feeds/');
 
ftp_ls($lfile[0],$lfile[1]);
 echo 
exec("gunzip -f ".$lfile[1]);
}
 echo 
'<br><br> End '.date("H:i:s").'<br>';
function 
ftp_ls($dir,$file)
{
    
$ftp_server'aftp.linksynergy.com';
    
$ftp_user_name 'user_name';
    
$ftp_user_pass 'user_pass';
    
// set up basic connection
    
$conn_id ftp_connect($ftp_server) or die("Couldn't connect to $ftp_server");  
    
// login with username and password
    
$login_result ftp_login($conn_id$ftp_user_name$ftp_user_pass); 
    
// check connection
    
if ((!$conn_id) || (!$login_result)) {
       die(
"FTP connection has failed !");
    }
    
// try to change the directory to somedir
    
if (ftp_chdir($conn_id$dir)) {
       
// echo "<br>Current directory is now: " . ftp_pwd($conn_id) . "\n <br>";
    
} else {
       echo 
"Couldn't change directory\n";
    }
    
// Initate the download
    
echo '<br>'.$file;
    
$ret ftp_nb_get($conn_id$file$fileFTP_BINARY);
    while (
$ret == FTP_MOREDATA) {
       
// Do whatever you want
       
echo "."flush();
       
// Continue downloading...
       
$ret ftp_nb_continue($conn_id);
    }
    if (
$ret != FTP_FINISHED) {
       echo 
"There was an error downloading the file...";
       exit(
1);
    }
    echo 
'ok.';
    
// close the connection
    
ftp_close($conn_id);
}
?>

Submitted by support on Mon, 2008-11-03 10:27

Hi Adrian,

The first thing to confirm of course is that PHP has WRITE access to the /feeds/
directory. If you login to your server, you can do this with a global write as
follows:

$cd path/to/pricetapestry
$chmod a+w feeds

(where $ is your command prompt)

Cheers,
David.

Submitted by AD_Mega on Mon, 2008-11-03 12:16

I tried running it again after doing this and I got the same result.

Submitted by paddyman on Wed, 2009-04-22 07:44

Hi David,

Have a new plesk server on which I'm able to automate feed downloads but not the import part.

Am using the following command

/usr/bin/php /var/www/vhosts/domainname.com/httpdocs/scripts/import.php @ALL

Used command - which php to find out where php is located and its coming back with /usr/bin/php so don't know whats wrong.

Any ideas ?

Thanks

Adrian

Submitted by support on Wed, 2009-04-22 08:12

Hi Adrian,

What happens when you try to execute that line exactly when logged in via SSH?

Can you capture and copy / paste the output?

Cheers,
David.

Submitted by paddyman on Wed, 2009-04-22 17:38

Hi David,

Here's what I get when I run

/usr/bin/php /var/www/vhosts/domainname.com/httpdocs/scripts/import.php @ALL

-----

Warning: set_time_limit(): Cannot set time limit in safe mode in /var/www/vhosts/domainname.com/httpdocs/scripts/import.php on line 3

Warning: require(): Unable to access ../includes/common.php in /var/www/vhosts/domainname.com/httpdocs/scripts/import.php on line 5

Warning: require(../includes/common.php): failed to open stream: No such file or directory in /var/www/vhosts/domainname.com/httpdocs/scripts/import.php on line 5

Fatal error: require(): Failed opening required '../includes/common.php' (include_path='.:') in /var/www/vhosts/domainname.com/httpdocs/scripts/import.php on line 5

------

Many thanks

Adrian

Submitted by support on Wed, 2009-04-22 18:21

Hi Adrian,

That's unusual that the command line version of PHP has been compiled in "Safe Mode". I would recommend contacting your host in the first instance, and asking that your account be enabled to run PHP outside of Safe Mode during SSH access...

If they respond that that's not possible, let me know and we'll look at alternative options...

Cheers,
David.

Submitted by paddyman on Thu, 2009-04-23 23:27

Hi David,

My host came back and mentioned I could change the php.ini file to disable safe mode. Having trouble finding the php.ini file at the moment !!

When I run php import.php from the scripts folder I also get the first safe mode error, but I don't get any of the other errors and the import completes successfully.

Would the safe mode error be causing the other errors, or is there a way this can be bypassed.

Many thanks

Adrian

Submitted by support on Fri, 2009-04-24 07:27

Hi Adrian,

Shared hosting is normally set-up so that you can create your own php.ini at some
point within your FTP area, but double check with your host as to exactly where it
should be (it's probably the home directory as per when you log in via FTP, which
is normally a level above where your website files go in a directory like public_html)

It should just need to contain:

safe_mode = off

Regarding the file not found errors (config.php), I think this might actually be because
PHP is not configured to run as if it were in the directory of the script. This would
explain why it works when you run it from the scripts folder. To work around this,
in your automation script, instead of:

/usr/bin/php /var/www/vhosts/domainname.com/httpdocs/scripts/import.php @ALL

try:

cd /var/www/vhosts/domainname.com/httpdocs/scripts/
/usr/bin/php import.php @ALL

Cheers,
David.

Submitted by paddyman on Fri, 2009-04-24 09:52

Many thanks David,

Working great now :)

Cheers

Adrian

Submitted by raf on Sat, 2009-05-23 16:07

Also in fonction admin_import() in admin.php
need to change

MagicParser_parse("../feeds/".$admin_importFeed["filename"],"admin__importRecordHandler",$admin_importFeed["format"]);

in absolute patch:

MagicParser_parse("/var/www/vhosts/*****/httpdocs/*****/feeds/".$admin_importFeed["filename"],"admin__importRecordHandler",$admin_importFeed["format"]);

Submitted by blogmaster2003 on Tue, 2010-03-16 19:40

Hi

i receive cj feed on one dir called cjfeeds (/home/public_html/cjfeeds)

all are *.gz

so i need to run gzip to ALL files in cjfeeds and then copy ALL of them that ends in nameofmerchant.txt to pathtopt/feeds

I tried to do the fetch.sh but i cant do the first step, can you help me with that?

Thank you.

PS: i will also use other affiliate networks, but for now i just need the cj feeds.
PS1: i achived to do this one by one, i was thinking if i can do all gzip at once and then cp or mv all txt to the feeds directory of pt?

Submitted by support on Wed, 2010-03-17 10:04

Hi,

Try something like this - the trick (which saves any need to copy / move) is to run gzip from the feeds directory.

cd /home/public_html/feeds/
gzip -d /home/public_html/cjfeeds/*.gz *.txt

Cheers,
David.

Submitted by blogmaster2003 on Wed, 2010-03-17 15:01

GREAT. THANK YOU

I´m almost ending the cj part, now im going to automate other networks, ill search in the forum. Problem is some threads are so old that i dont know if they still work with the last pt version.

Submitted by support on Wed, 2010-03-17 15:05

The automation aspect hasn't really changed at all, but any problems of course just let me know...

Cheers,
David.

Submitted by babyuniverse on Tue, 2010-05-18 10:24

Hi David,

I am trying to automate the Affiliate Window using my fetch.sh below

####################
# AFFILIATE WINDOW #
####################
/usr/bin/wget -O "/home/sportuk/public_html/feeds/merchant.xml.zip" "AFFILATE URL"
/usr/bin/unzip /home/sportuk/public_html/feeds/merchant.xml.zip > /home/sportuk/public_html/feeds/merchant.xml

I am successfully retreiving merchant.xml.zip from their server but when I unzip it the file is called 1234.xml so I am trying to rename the output file to merchant.xml

Using my script above A file called merchant.xml is created but it only contains a text message that says "Archive: /home/sportuk/public_html/feeds/aw_subsidesports.xml.zip" and not the actual unzipped xml

I also tried using -P as per one of the posts above but was unsure where it went

Hope this makes sense
Richard

Submitted by support on Tue, 2010-05-18 11:19

Hi Richard,

It's the lowercase -p option that is required, which means "unzip to stdout" instead of the filename contained within the archive, and is then saved as the file pointed to after the > character. Have a go with:

/usr/bin/unzip -p /home/sportuk/public_html/feeds/merchant.xml.zip > /home/sportuk/public_html/feeds/merchant.xml

Cheers,
David.

Submitted by babyuniverse on Tue, 2010-05-18 11:38

Thanks David,

That is working fine now

Regards
Richard

Submitted by Stew on Fri, 2010-09-10 10:10

Hi David,

Just a quick question regarding automation.

Once I have set up the import script fetch.sh and imported all of the csv files into the scripts folder, how does PT know how to match those files up with the registered feeds in the PT admin area? (I haven't done this yet, am just trying to get my head around it),

Thanks

Stew

Submitted by babyuniverse on Sun, 2010-09-12 12:04

Hi David,

I have created a new site on a different webhost and I am having trouble getting the following to work

####################
# AFFILIATE WINDOW #
####################
/usr/bin/wget -O "/home/cheapcom/public_html/feeds/merchant.xml.zip" "AFFILATE URL"
/usr/bin/unzip -p /home/cheapcom/public_html/feeds/merchant.xml.zip > /home/cheapcom/public_html/feeds/merchant.xml

The merchant.xml.zip is correctly fetching from the affiliate site, but something is breaking during unzip. The strange thing is in PT admin I can see the merchant.xml file which doesnt contain any data. If I download the feed and unzip on my PC then re upload I can then see two files called merchant.xml the original one that doesnt have any data and the new uploaded one which works fine. (I can only see 1 file using my FTP program???)

I am assuming this is some setting on the new server, I had problems with the #!/bin/sh interpreter one line one and wouldnt work untill i removed it.

Any ideas on what may be causing the problem? FYI the cron report is as follows
______________________________________________________________________________________________
Resolving datafeed.api.productserve.com... 62.216.237.94
Connecting to datafeed.api.productserve.com|62.216.237.94|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 144846 (141K) [application/zip]
Saving to: `/home/cheapcom/public_html/feeds/merchant.xml.zip'

0K .......... .......... .......... .......... .......... 35% 134K 1s
50K .......... .......... .......... .......... .......... 70% 491K 0s
100K .......... .......... .......... .......... . 100% 440K=0.6s

2010-09-12 08:03:03 (249 KB/s) - `/home/cheapcom/public_html/feeds/merchant.xml.zip' saved [144846/144846]
______________________________________________________________________________________________________________

Thanks
RIchard

Submitted by support on Sun, 2010-09-12 14:50

Hi Richard,

The code looks fine - it's strange that you can see 2 files of the same name; which could be part of the problem; for example if you have tested the script whilst logged into the command prompt; and then tried to run it again via CRON.

Make sure that your FTP program is refreshed, and set to reveal hidden files (although I'm pretty sure it won't be anything to do with that), then via FTP delete all merchant.xml.zip and merchant.xml files, and then let the process run via CRON...

If no joy; what are you seeing in /admin/ in terms of filenames relating to merchant.xml? Your script doesn't yet have a delete process so I would anticipate:

merchant.xml
merchant.xml.zip

Cheers,
David.
--
PriceTapestry.com

Submitted by rassad on Sun, 2010-09-26 19:16

Hi David,

I got the script mostly working, after unzipping and removing the zip file I get the following error

PHP Notice: Undefined variable: admin_checkPassword in /home/vhosts/rassad.co.uk/httpdocs/includes/admin.php on line 539
Usage: import.php |@ALL|@MODIFIED [limit]

any ideas?
Thanks

Submitted by support on Mon, 2010-09-27 09:15

Hi rassad,

It's because PHP warnings are set to MAX on your server - just add the following line at the top of your script:

  $admin_checkPassword = FALSE;

Cheers,
David.
--
PriceTapestry.com

Submitted by rassad on Mon, 2010-09-27 11:22

Hi David,

I added the code but still getting the same error

Thanks

Submitted by support on Mon, 2010-09-27 11:41

Hi,

My apologies, I meant to add the code to scripts/import.php. In that file, look for the following code at line 7:

  require("../includes/admin.php");

...and REPLACE with:

  $admin_checkPassword = FALSE;
  require("../includes/admin.php");

Cheers,
David.
--
PriceTapestry.com

Submitted by rassad on Mon, 2010-09-27 13:37

Hi David,

Its working now

Thanks!

Submitted by Stew on Tue, 2010-09-28 09:54

Hi David, hope all well.

I've managed to get fetch.php nicely importing for most of my feeds.

I have one at the momement which doesn't seem to work (is on a seperate network to the others and is only one that is zipped).

Within my fetch.php I have the following (have removed feeds that are working):

#!/bin/sh
#########
# FETCH #
#########

/usr/bin/wget -O "/home/public_html/pt/feeds/trade_d_merchantc.xml.gz" "http://pf.tradedoubler.com/export/export?myFeed=133333333333333221&myFormat=133333333333331"

#########################
# DECOMPRESS AND DELETE #
#########################
/usr/bin/unzip -c -d -s "" /home/public_html/pt/feeds/trade_d_merchantc.xml.gz > /home/public_html/pt/feeds/trade_d_merchantc.xml
rm /home/public_html/pt/feeds/trade_d_merchantc.xml.gz
##########
# IMPORT #
##########
/usr/bin/php /home/public_html/pt/scripts/import.php @MODIFIED

At the moment I'm getting the following error in the cron report:

2010-09-28 10:46:07 (319 KB/s) - `/home/public_html/pt/feeds/trade_d_merchantc.xml.gz' saved [117897]

error: must specify directory to which to extract with -d option

Ive also tried using unzip and the feed pops up in the PT admin area but all fields are blank,

Not sure if Ive missed something?

Many Thanks,

Stew

Submitted by support on Tue, 2010-09-28 11:25

Hi Stew,

The feed looks like it is gzipped rather than zipped (they are different compression programs) so you need to use gzip -d rather than unzip. Instead of:

/usr/bin/unzip -c -d -s "" /home/public_html/pt/feeds/trade_d_merchantc.xml.gz > /home/public_html/pt/feeds/trade_d_merchantc.xml

...try:

/usr/bin/gzip -c -d -s "" /home/public_html/pt/feeds/trade_d_merchantc.xml.gz > /home/public_html/pt/feeds/trade_d_merchantc.xml

That should be all it is! I notice that you already had the correct parameters for gzip so I think that's the only problem...

Cheers,
David.
--
PriceTapestry.com

Submitted by Stew on Tue, 2010-09-28 11:47

Hi David, thanks for that, I changed to /usr/bin/gzip as above and ran the cron.

Its a bit strange, in the pt/feeds/ folder I took a look at the new /trade_d_merchantc.xml file to see what downloaded and when I open it there is the following text inside:

"
gzip 1.3.5
(2002-09-30)
usage: gzip [-cdfhlLnNrtvV19] [-S suffix] [file ...]
-c --stdout write on standard output, keep original files unchanged
-d --decompress decompress
-f --force force overwrite of output file and compress links
-h --help give this help
-l --list list compressed file contents
-L --license display software license
-n --no-name do not save or restore the original name and time stamp
-N --name save or restore the original name and time stamp
-q --quiet suppress all warnings
-r --recursive operate recursively on directories
-S .suf --suffix .suf use suffix .suf on compressed files
-t --test test compressed file integrity
-v --verbose verbose mode
-V --version display version number
-1 --fast compress faster
-9 --best compress better
--rsyncable Make rsync-friendly archive
file... files to (de)compress. If none given, use standard input.
Report bugs to ."

Thus when I try to register the feed, PT can only see this data I think,

Stew

Submitted by support on Tue, 2010-09-28 12:15

Hi Stew,

Ah - I didn't spot this in the above, -s should be -S - have a go with:

/usr/bin/gzip -c -d -S "" /home/public_html/pt/feeds/trade_d_merchantc.xml.gz > /home/public_html/pt/feeds/trade_d_merchantc.xml

Cheers,
David.
--
PriceTapestry.com

Submitted by Stew on Tue, 2010-09-28 14:06

Hi David,

That worked brilliant!

Many Thanks

Submitted by crounauer on Thu, 2011-04-28 19:36

Hi David,

I have just recently changed to a Rackspace Cloud Server. The script is working perfectly but I am having problems with setting up cron jobs.

Cron is working because is is outputting to /var/log/cron when the cron runs.

Apr 28 19:01:01 PricesCompare /USR/SBIN/CRON[28412]: (crounauer) CMD (root /home/xxxx/xxxx/xxxx.co.uk/public/scripts/download_aw_vouchers.sh)

I am using this shell script

#!/bin/sh
#########
# FETCH #
#########
#######################################
##start affiliate window vouchers:csv##
#######################################
/usr/bin/wget -O "/home/xxx/xxx/xxxx.co.uk/public/feeds/affiliate_window_vouchers.csv" "http://www.affiliatewindow.com/affiliates/xxxx.php?user=61612&password=xxxx&export=csv&submit=1&rel=1"

with this in crontab -e

* 4 * * * /home/public_html/xxxx/xxxx/xxx.co.uk/public/feeds/download_aw_vouchers.sh

The "Feeds" folder has 777 privileges and the "download_aw_vouchers.sh" also has 777 privileges.

To debug the shell script I used (as user - not root) sh -x download_aw_vouchers.sh and it executed as it should. It just doesn't execute when run as a cron job.

Have I missed something here?

Thanks,
Simon

Submitted by support on Thu, 2011-04-28 20:04

Hi Simon,

This sounds like a permissions conflict - either the command line, or cron, can create the file /path/to/feeds/affiliate_window_vouchers.csv, but neither can overwrite the file if it already exists.

To check this, DELETE the file before CRON is scheduled to run and it should then be able to download and create the file as normal...

Cheers,
David.
--
PriceTapestry.com

Submitted by crounauer on Thu, 2011-04-28 20:10

Hi David,

Thanks for your prompt response.

I have unfortunately already tried that. The folder was empty before the cron started.

Any other suggestions?

Simon

Submitted by support on Thu, 2011-04-28 20:16

Hi Simon,

I wonder if CRON isn't picking up the correct command interpreter for the script.

As the CRON command, in place of

/home/xxxx/xxxx/xxxx.co.uk/public/scripts/download_aw_vouchers.sh

Try:

/bin/sh /home/xxxx/xxxx/xxxx.co.uk/public/scripts/download_aw_vouchers.sh

Cheers,
David.
--
PriceTapestry.com

Submitted by crounauer on Thu, 2011-04-28 20:28

Hi David,

Still no luck!

Thanks,
Simon

Submitted by support on Thu, 2011-04-28 20:41

Hi Simon,

What I would do at this point is redirect the CRON job output
to a file which can then be studied for any information that
might indicate why it is not able to run the script.

Rolling back, in place of:

/home/xxxx/xxxx/xxxx.co.uk/public/scripts/download_aw_vouchers.sh

Use:

/home/xxxx/xxxx/xxxx.co.uk/public/scripts/download_aw_vouchers.sh > /home/xxxx/xxxx/xxxx.co.uk/public/feeds/cron.txt

This should pipe the output to the /feeds/cron.txt file which may indicate
the problem - if you're not sure of course let me know what is output to
the file and I'll take a look.

If no file is created of course, that would indicate a permissions problem.
If the the feeds/ folder is mode 777, double check that all users have read
access all the way up the file system tree to root from that point...

Hope this helps,

Cheers,
David.
--
PriceTapestry.com

Submitted by crounauer on Thu, 2011-04-28 20:57

Hi David,

That did create the cron.txt file, but it had no contents.

Thanks,
Simon

Submitted by support on Fri, 2011-04-29 07:37

Hi Simon,

Could you try adding -v (verbose) to the wget command, e.g.

/usr/bin/wget -v -O ...

That should generate more output and may indicate why CRON is not producing the same results as manually running the command...

Cheers,
David.
--
PriceTapestry.com

Submitted by crounauer on Fri, 2011-04-29 10:12

Hi David,

I can only add -v to the shell script which isn't producing an output. I.e.

/usr/bin/wget -v -O "/home/xxx/xxx/xxx.co.uk/public/feeds/affiliate_window_vouchers.csv" "http://www.affiliatewindow.com/xxx/discount_vouchers.php?user=xxx&password=xxx&export=csv&submit=1&rel=1"

Did you mean for me to add it to this?

* * * * * /home/xxx/xxx/xxx/public/scripts/download_aw_vouchers.sh > /home/xxx/xxx/xxx.co.uk/public/feeds/cron.txt

Thanks,
Simon

Submitted by support on Fri, 2011-04-29 13:33

Hi Simon,

It is a wget parameter so would need to go inside the script, I was hoping it
would result in more output in cron.txt.

Have you checked that download_aw_vouchers.sh is marked as executable? You
can check this from the command line by changing directory to scripts/ and
then running the command:

chmod +x download_aw_vouchers.sh

If that doesn't help - try adding /bin/sh to the cron entry, e.g.

* * * * * /bin/sh /home/xxx/xxx/xxx/public/scripts/download_aw_vouchers.sh > /home/xxx/xxx/xxx.co.uk/public/feeds/cron.txt

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by clare on Fri, 2011-04-29 16:41

Just in case it is relevant..last week I had a problem with getting discount vouchers from AW, the usual url was not working properly. So they fixed it and the url to get discount vouchers has changed to ..

http://www.affiliatewindow.com/affiliates/discount_vouchers.php?user=***5&password=***&export=csv&voucherSearch=1&cat=99&rel=1

(it has an extra parameter voucherSearch)

It may not be relevant of course, but i thought I would mention it just in case it is the cause of the problem.

Submitted by crounauer on Fri, 2011-04-29 21:56

Hi David,

Thanks for all your help. I got it sorted after struggling with it for 2 days. Just when I thought I had Debian and Cron sussed!!!

Anyway, it turns out that the text editor (notepad2) I was using had defaulted back to Windows line endings. Totally stumped as to how that happened and also unsure why it worked when I ran it from the terminal and then not from Cron.

The moral of the story: Don't work on something for 2 solid days. Take a break, don't assume anything and look at it again in the morning with a fresh pair of eyes!

Thanks again for all your help. Your support is truly one of a kind!

Simon
P.s. Thanks for the heads up about the voucher code url

Submitted by newpbc on Fri, 2011-06-10 16:37

Hi David

I'm guessing the script has changed a little since this I had a similar problem I've searched through the admin code and fount two instances global $admin_importCallback; the forst at line 128 and the second at 368 which one would I need to replace to stop this happening.

Submitted by support on Fri, 2011-06-10 16:45

Hi newpbc,

For the abort if download filesize is zero bytes mod use the second occurrence (line 368 in your file)...

Cheers,
David.
--
PriceTapestry.com

Submitted by Rocket32 on Mon, 2013-07-22 19:36

How do I change the format of the file in the feeds directory after it is gzipped & extracted to feeds folder? It is extraced & named a .txt file. How to I get this to be a .csv file instead in the above code? Also is it possible to rename with a merchant's name instead of all the numbers?

Submitted by support on Tue, 2013-07-23 08:30

Hi,

It's probably best to perform the rename as required in the gzip decompress operation, as the -c option can be used to gzip to stdout which you can then redirect to the filename of your choice.

Assuming that you script is running from the feeds folder (e.g. a cd command issued before your fetch / gzip commands) and gzip located at /bin/gzip then you could use:

/bin/gzip -c -d -S "" 1234.txt.gz > MerchantName.csv

Cheers,
David.
--
PriceTapestry.com