You are here:  » Sitemap Protocol 0.84 is now deprecated.


Sitemap Protocol 0.84 is now deprecated.

Submitted by Convergence on Fri, 2012-07-06 21:38 in

Note: Latest distribution includes upgraded sitemap.php to sitemaps.org schema version 0.9. Downloaded for earlier distributions here

Greetings,

Our sitemaps show they are:
http://www.google.com/schemas/sitemap/0.84/

New schema is .9 - is there an update we need to do? Still having problems with Bing saying the sitemap.php contains no URLs.

Thanks!

Submitted by support on Sun, 2012-07-08 15:24

Hi,

Here's the changes to 0.9, which I'll update in the distribution. In sitemap.php, look for the following code at line 14:

print "<urlset xmlns='http://www.google.com/schemas/sitemap/0.84' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd'>";

...and REPLACE with:

print "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>";

Next, look for the following code at line 48:

    print "<sitemapindex xmlns='http://www.google.com/schemas/sitemap/0.84' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/siteindex.xsd'>";

...and REPLACE with:

    print "<sitemapindex xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>";

Let me know if this doesn't resolve the BING issue; sitemap.php itself does not contain URLs as it returns a sitemap index which BING should be following if it supports the sitemap XML protocol correctly... I'll investigate further...

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Thu, 2012-07-12 21:09

Hi David,

After five days of "Pending" at Bing, they have once again displayed:

"The feed was empty."

Submitted by support on Fri, 2012-07-13 07:41

Hi,

That's interesting, bear in mind that if you are able to submit multiple sitemaps then you can submit the perfeed sitemaps manually. The easiest way is simply to browse to /sitemap.php and then copy the individual feed sitemaps URLs, which are of the format:

sitemap.php?filename=MerchantName.xml

(where Merchant.xml is the feed filename)

Note that in accordance with the sitemaps protocol each file is limited to 50,000 URLs - if you have feeds with greater than 50,000 products then additional pages can mapped using the start parameter; e.g.

sitemap.php?filename=MerchantName.xml&start=500000

Cheers,
David.
--
PriceTapestry.com

Submitted by Convergence on Fri, 2012-07-13 23:31

Hi David,

Submitting individual sitemaps may be a last resort - nearly 150 from one site...

Submitted by Convergence on Sun, 2012-07-22 19:46

Today Bing finally acknowledged ONE of our sitemaps with "Success" - even shows 77 pages of the 200K+ as indexed, LOL...

Submitted by Convergence on Fri, 2012-07-27 04:20

Bing has FINALLY recognized all of our sitemaps as "Sitemap Index" and has crawled them on multiple days.

Just had to keep resubmitting them until they choked on them - and gave in...

Submitted by support on Fri, 2012-07-27 09:29

Thanks for the update!

Cheers,
David.
--
PriceTapestry.com

Submitted by Bakalinge on Mon, 2013-04-15 21:06

Hi David,

I noticed a little bug with sitemap and accentuated caracters. For example, I have an url like this :

domain.com/Intégral.html

but in sitemap, the same url is written like this :

domain.com/Int%E9gral.html

which gives me...36321 errors (404) if I trust Google Webmaster Tools :)

I don't know if it's come from the new schema, or if this was already existing with protocol 0.84...

Thanks !

Bak

Submitted by support on Tue, 2013-04-16 08:23

Hello Bak,

That's interesting - the correct encoding should be:

domain.com/Int%C3%A9gral.html

Please could you check if there are any changes to your tapestry_productHREF() function in includes/tapestry.php as this is where the product name is urlencoded(). Line 69 should read:

return $config_baseHREF."product/".urlencode(tapestry_hyphenate($product["normalised_name"])).".html";

I just tested using your example and the above does encode correctly (to domain.com/Int%C3%A9gral.html) so if there are any changes to the above line, let me what's currently in place (as it was probably changed for a reason!) and I'll check it out for you. If that all looks good, if you could perhaps let me know the working URL of a product that causes this problem (I'll remove before publishing your reply) and the filename of the feed from which it comes I'll check it out further for you...

Cheers,
David.
--
PriceTapestry.com

Submitted by Bakalinge on Tue, 2013-04-16 10:18

Hello David,

After some research, I think I understand what is happening. I made a mistake thinking the problem was coming from the url encoding (I see "Intégral.html" in my browser while Google see "Int%E9gral.html" but it's fine), sorry about that (90% of my urls contains accentuated caracters, it was easy to place blame)

No the problem is the removed products that return an HTTP 404 response. With a catalog of 85000 products and probably 1 or 2% of this products removed from feed, or with changes in their name or whatever, it creates a lot of alerts from Google Webmaster Tools :

"Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages".

That is not reassuring :)

I found an answer to this problem here : http://www.pricetapestry.com/node/4854. My job now to create a very user-friendly landing page :)

Thanks David, and sorry about any inconvenience this may cause.

Bak