Just checking if I understand it correctly.
I use and I checked a feed today 2011-03-29.
The pricetapestry admin says:
Modified: 2011-03-20 02:07:01
Imported: 2011-03-20 02:45:02
My cron job includes: import.php @MODIFIED
I noticed it wasn't updated while the feed contains the updated material from 2011-03-29.
When I look into the feed I notice:
<last_update>
- <![CDATA[ 2011-03-29 10:37:01
]]>
</last_update>
- <last_modification>
- <![CDATA[ 2011-03-20 02:07:01
]]>
</last_modification>
So I guess it isn't imported because the feed contains a modification date that is not after the modified date registered by the pricetapestry admin?
So I should get the feed adapted or use @ALL parameter with import.php. Is this it?
Strangely enough before I started this topic I looked into the feeds folder and into the feed the new product was there. But wasn't imported. In the feed itself I saw the updated date of the 29th.
So I went to admin and imported it there, it then showed the new product. But it didn't do that between the 20th and 29th with the cron which runs more than daily (with @MODIFIED).
What do the admin Modified and admin Imported dates mean and what is the relation with the feed data itself?
I mean is the modified date the last date price tapestry modified the products and is the imported date the last time price tapestry tried to import the data.
So how does pricetapestry determine to import the feed if it has the parameter @MODIFIED? Does it check it by looking at the date of the feed in the feeds folder or does it look into the xml data of the feed?
Hi Marco,
Modified is the file system timestamp of the feed, and Imports is the date / time the feed was last imported.
When you run import.php @MODIFIED, the script simply imports all feeds that have a Modified date > than the Imported date; as this means that a new copy of the feed is in the /feeds/ folder that has not yet been imported.
There is no relationship at all with the data contained within the feed; if you happened to upload an old version by accident then it would appear to be modified and would be imported by import.php @MODIFIED !
Cheers,
David.
--
PriceTapestry.com
Now I understand why this feed doesn't get imported. But do you have any idea's how the file system timestamp of a feed can be at a past time?
I have this for a feed:
Modified: 2011-03-20 20:37:03
Imports: 2011-03-29 09:05:06
But I'm pretty sure the feed is new and downloaded to my server at a later time than the imports time. I estimate something after: 2011-03-30 07:44:01 (this is the time the feed is generated on the network). And I checked the content of the feed today and yesterday and the content is different.
Hello Marco,
The above looks like the feed did update (today), but it was only imported yesterday, meaning that it was not processed by import.php @MODIFIED for some reason (note that this is different to the example you posted yesterday where the dates were as expected).
Could your import process be timing out do you think? I assume that you have this set-up via CRON, which often has the option the generate output which would indicate if the process aborted early.
Does your automation process fetch other feeds that update and import correctly and it's just this one feed causing the problem? A timeout issue can sometimes be identified by the modified date of all your feeds being up to date, and only a number of them showing an updated import time.
If you're still not sure, could you let me know the URL of the installation and any password required for /admin/ plus the name(s) of the feed causing the problem and I'll take a look for you (i'll remove the URL before publishing your reply)...
Cheers,
David.
--
PriceTapestry.com
Hi,
I looked in it again. The productdetails are updated today but the file system timestamp is still 20th March. So the new product wasn't updated. I looked into the cronmessage (see below for excerpt) but it is downloaded. I should mention that I use a central pricetapestry installation where I get the feeds for this 'slave' pricetapestry site.
The cron job feches several other feeds but there I couldn't notice similair problems.
Could you please leave out all the details below?
Thanks Marco,
Can you post the line of your fetch script that fetches boekvandaag_da_dc.xml so I can see the wget command line options being used. I double checked the -O always creates a new file so the timestamp really should be updated...
Thanks,
David.
--
PriceTapestry.com
Hi,
Underneath the line. I also discovered that there is another feed which displays the same problem. I also put the line underneath.
Please remove the url:
Hi Marco,
It does look like the timestamp is not being updated. Please try this, for example:
/usr/bin/wget -O "/home/username/public_html/feeds/boekvandaag_da_dc.xml" "{link saved}"
/usr/bin/touch "/home/username/public_html/feeds/boekvandaag_da_dc.xml"
(the first line is not changed from your code, just add the touch line after each call to wget, replacing username with your username that appears in that place in the path name...
Hope this helps!
Cheers,
David.
--
PriceTapestry.com
The timestamp isn't updated with the touch command.
Hello Marco,
That's really strange that the file is being updated but neither wget -O or touch is updating the timestamp.
I wonder if it's a permissions problem. This can happen sometimes if the file has been created previously by another user. Try deleting the files first before running the fetch script (that must update the timestamps of course), but then try running the script a second time and check the that the timestamp has then been updated again...
Hope this helps,
Cheers,
David.
--
PriceTapestry.com
It is really strange. I run the fetch script for the two feeds. One with touch and one without.
The first time I run it I checked the timestamp and they are still on the 20th of March. But the content is updated.
When I look closely to the file system stamp and the xml content I notice that the filesystem stamp is exactly the same as the date/time between the in the xml feeds:
Can it be that the datafeeds are give a filesystemstamp by the affiliate network which doesn't change when uploaded to my server?
Hi Marco,
It won't be anything to do with the affiliate network or content within the feed as they are completely independent of the file system - the file modified time ( which is displayed in /admin/ ) is the only relevant time, as compared to imported, which should both be derived from your server's system clock).
Are other feeds working correctly, and the modified time is updated consistently with when you run the fetch script?
Can you double-check that the URLs are correct; as if wget is not able to download the URL (for example 404 Not Found) then the files would not be updated, so that could be happening...
Cheers,
David.
--
PriceTapestry.com
The other feeds are working correctly and time is updated.
Also the url's are correct.
Hello Marco,
That is really strange, I'm afraid I can't see why the timestamp is not being updated. What I would suggest therefore is removing the files before fetching, as this will definitely update the timestamp (as it must be brand new file!). To do this, use, for example:
rm "/home/username/public_html/feeds/boekvandaag_da_dc.xml"
/usr/bin/wget -O "/home/username/public_html/feeds/boekvandaag_da_dc.xml" "{link saved}"
(replacing username as before)
That will definitely do it!
Cheers,
David.
--
PriceTapestry.com
Tried it, but still shows a timestamp of March 20.
Hello Marco,
I got the path to the "touch" command wrong above i'm afraid, please can you try:
/usr/bin/wget -O "/home/username/public_html/feeds/boekvandaag_da_dc.xml" "{link saved}"
/bin/touch "/home/username/public_html/feeds/boekvandaag_da_dc.xml"
Hope this helps!
Cheers,
David.
--
PriceTapestry.com
Hi,
Didn't work. Still the 20th. I tried this:
/bin/rm "/home/username/public_html/feeds/boekvandaag_da_dc.xml"
/usr/bin/wget -O "/home/username/public_html/feeds/boekvandaag_da_dc.xml" "http://affiliatelink"
/bin/touch "/home/username/public_html/feeds/boekvandaag_da_dc.xml"
Hi Marco,
Do you have command line (SSH) access to your server? If so, can you use the locate command to try and find out where the touch binary is on your server?
$locate touch
(where $ is your command prompt)
Alternatively, try the following PHP script:
<?php
passthru("locate touch");
?>
I've managed to recreate this now on my own server; and what seems to be happening is that regardless of the timestamping options; if a server sends a last-modified header then wget is updating the timestamp of the output file to that time; regardless of other timestamping options. This seems to be contrary to the documentation, which implies that if -O is used then the timestamp of the output file will always be "now". I'll carry on looking into this for you.
In the mean time; if the output of the locate command points to your touch binary, have a go updating the fetch script to use whatever path touch is installed at which should do the trick..
Hope this helps,
Cheers,
David.
--
PriceTapestry.com
i'll try this out and do you mean with the header the in the xml feed?
It is possible for me to leave this out of the feed.
Hi Marco,
It's an HTTP header being returned by the server so I don't think it will be configurable (although check with the feed URL parameters / documentation as it's not impossible that you can request for it to be left out); however if it _is_ accurate then there's no reason why it shouldn't be useful and would prevent the feed being unnecessarily re-imported; but of course if the HTTP last-modified header is wrong then that's not an option unfortunately...
Cheers,
David.
--
PriceTapestry.com
Hm too bad. I heard from somebody that the touch won't do anything because the file already exists. Touch is only used to create a new empty file according to him.
As an alternative workaround:
Is it possible to do a import@ALL for only some specific fields?
Hello Marco,
The touch command should update the timestamp of an existing file...
http://unixhelp.ed.ac.uk/CGI/man-cgi?touch
However, yes - you can certainly import the files individually. In addition to @ALL or @MODIFIED, you can specify a filename to import; for example:
import.php boekvandaag_da_dc.xml
Hope this helps,
Cheers,
David.
--
PriceTapestry.com
Hi Marco,
If the modified date reported by /admin/ is earlier than the imported time (which it is by about 35 minutes) then you will have definitely imported the latest version from your /feeds/ folder.
I would imagine in this case that it is a data error in the feed - most likely a transient error so I would try re-downloading it later and then importing @MODIFIED again. If you want to be doubly sure you can use an import.php @ALL but i'd be confident that you have imported the latest version based on the dates / times reported by /admin/...
Cheers,
David.
--
PriceTapestry.com