Note: Latest distribution includes upgraded sitemap.php to sitemaps.org schema version 0.9. Downloaded for earlier distributions here
Greetings,
Our sitemaps show they are:
http://www.google.com/schemas/sitemap/0.84/
New schema is .9 - is there an update we need to do? Still having problems with Bing saying the sitemap.php contains no URLs.
Thanks!
Hi David,
After five days of "Pending" at Bing, they have once again displayed:
"The feed was empty."
Hi,
That's interesting, bear in mind that if you are able to submit multiple sitemaps then you can submit the perfeed sitemaps manually. The easiest way is simply to browse to /sitemap.php and then copy the individual feed sitemaps URLs, which are of the format:
sitemap.php?filename=MerchantName.xml
(where Merchant.xml is the feed filename)
Note that in accordance with the sitemaps protocol each file is limited to 50,000 URLs - if you have feeds with greater than 50,000 products then additional pages can mapped using the start parameter; e.g.
sitemap.php?filename=MerchantName.xml&start=500000
Cheers,
David.
--
PriceTapestry.com
Hi David,
Submitting individual sitemaps may be a last resort - nearly 150 from one site...
Today Bing finally acknowledged ONE of our sitemaps with "Success" - even shows 77 pages of the 200K+ as indexed, LOL...
Bing has FINALLY recognized all of our sitemaps as "Sitemap Index" and has crawled them on multiple days.
Just had to keep resubmitting them until they choked on them - and gave in...
Hi David,
I noticed a little bug with sitemap and accentuated caracters. For example, I have an url like this :
domain.com/Intégral.html
but in sitemap, the same url is written like this :
domain.com/Int%E9gral.html
which gives me...36321 errors (404) if I trust Google Webmaster Tools :)
I don't know if it's come from the new schema, or if this was already existing with protocol 0.84...
Thanks !
Bak
Hello Bak,
That's interesting - the correct encoding should be:
domain.com/Int%C3%A9gral.html
Please could you check if there are any changes to your tapestry_productHREF() function in includes/tapestry.php as this is where the product name is urlencoded(). Line 69 should read:
return $config_baseHREF."product/".urlencode(tapestry_hyphenate($product["normalised_name"])).".html";
I just tested using your example and the above does encode correctly (to domain.com/Int%C3%A9gral.html) so if there are any changes to the above line, let me what's currently in place (as it was probably changed for a reason!) and I'll check it out for you. If that all looks good, if you could perhaps let me know the working URL of a product that causes this problem (I'll remove before publishing your reply) and the filename of the feed from which it comes I'll check it out further for you...
Cheers,
David.
--
PriceTapestry.com
Hello David,
After some research, I think I understand what is happening. I made a mistake thinking the problem was coming from the url encoding (I see "Intégral.html" in my browser while Google see "Int%E9gral.html" but it's fine), sorry about that (90% of my urls contains accentuated caracters, it was easy to place blame)
No the problem is the removed products that return an HTTP 404 response. With a catalog of 85000 products and probably 1 or 2% of this products removed from feed, or with changes in their name or whatever, it creates a lot of alerts from Google Webmaster Tools :
"Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages".
I found an answer to this problem here : http://www.pricetapestry.com/node/4854. My job now to create a very user-friendly landing page :)
Thanks David, and sorry about any inconvenience this may cause.
Bak
Hi,
Here's the changes to 0.9, which I'll update in the distribution. In sitemap.php, look for the following code at line 14:
print "<urlset xmlns='http://www.google.com/schemas/sitemap/0.84' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd'>";
...and REPLACE with:
print "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>";
Next, look for the following code at line 48:
print "<sitemapindex xmlns='http://www.google.com/schemas/sitemap/0.84' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/siteindex.xsd'>";
...and REPLACE with:
print "<sitemapindex xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>";
Let me know if this doesn't resolve the BING issue; sitemap.php itself does not contain URLs as it returns a sitemap index which BING should be following if it supports the sitemap XML protocol correctly... I'll investigate further...
Cheers,
David.
--
PriceTapestry.com