You are here:  » Google sitemap code

Support Forum



Google sitemap code

Submitted by cataleptic on Sat, 2009-09-19 18:14 in

(Edit: updated for 12/10B)

I'm using the sitemap code from a previous forum post.

it makes the sitemaps, but the url's are wrong.

This is the code:

<? // #!/usr/bin/php
/** make a directory "sitemap" under you site path and save this script under sitemap.php **/
/** will create files like this :
sitemap1.xml.gz
sitemap2.xml.gz
sitemap3.xml.gz
...
sitemap350.xml.gz
**/
  require("includes/common.php");
  $limit = 5000;
  $start = 0;
  $xml2 = "<?xml version='1.0' encoding='UTF-8'?>";
  $xml2 .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">';
 $sql = "SELECT count(DISTINCT(name)) as resultc FROM `".$config_databaseTablePrefix."products`";
 database_querySelect($sql,$rows);
 $resultat1 = $rows[0]["resultc"];
 $resultat = number_format(ceil($resultat1/$limit),'');
        for($i=1;$i < $resultat+1; $i++)
        {
        $xml = "<?xml version='1.0' encoding='UTF-8'?>";
        $xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
        $sql = "SELECT DISTINCT(name) FROM `".$config_databaseTablePrefix."products` LIMIT ".$start.",".$limit;
        $sitemapBaseHREF = $config_baseHREF2;
        if (database_querySelect($sql,$rows)) {
        foreach($rows as $product) {
        $product["normalised_name"] = tapestry_normalise($product["name"]);
        $item = array();
        $item["name"] = $product["name"];
        $xml .= "\n<url>";
        $xml .= "\n<loc>".$sitemapBaseHREF.tapestry_productHREF($product)."</loc>";
        $xml .= "\n<lastmod>".date("Y-m-d")."</lastmod>";
        $xml .= "\n<changefreq>weekly</changefreq>";
        $xml .= "\n</url>";
    }
}
        $xml .= "\n</urlset>";
/** we save sitemap files in a the new directory "sitemap" **/
        $fp = fopen("sitemap/sitemap".$i.".xml", 'w+');
        fputs($fp, $xml);
        fclose($fp);
        system('gzip -f sitemap'.$i.'.xml');
        $start = $start + $limit;
/** we add this file path to the indexmap in the root "yourpath" sitemap.xml **/
        $xml2 .= "\n<sitemap>";
        $xml2 .= "\n<loc>".$sitemapBaseHREF.$config_baseHREF."sitemap/sitemap".$i.".xml.gz</loc>";
        $xml2 .= "\n<lastmod>".date("Y-m-d")."</lastmod>";
        $xml2 .= "\n</sitemap>";
}
        $xml2 .= "\n</sitemapindex>";
        $fp = fopen("sitemap.xml", 'w+');
        fputs($fp, $xml2);
        fclose($fp);
/** finish **/
/** we can ping google, yahoo, msn **/
//system(wget -O RESU_GOOGLE "http://www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.mydomain.co.uk%2Fsitemap.xml");
//system(wget -O RESU_YAHOO "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://www.mydomain.co.uk/sitemap.xml");
//system(wget -O RESU_MSN "http://webmaster.live.com/webmaster/ping.aspx?siteMap=http://www.domain.co.uk/sitemap.xml");
?>

This is the result in google sitemap:
http:///product/Wedgwood-Ceramic-Experience-for-Two.html

Where do I have to add the domain code?

Submitted by support on Sun, 2009-09-20 13:56

Hi,

It looks like this is down to the $config_baseHREF2 variable; which it looks like is empty at the moment. If you add this to your config.php as follows:

  $config_baseHREF2 = "http://www.mydomain.com";

Note that there is no trailing "/". This is because the sitemap code above combines this with $config_baseHREF, which contains the / plus any additional path if required (but not in the case of your site)...

That should be all it is!

Cheers,
David.

Submitted by cataleptic on Wed, 2009-09-23 00:24

Brilliant,

That works. But the sitemap indexing has not added any urls yet.

Submitted by support on Wed, 2009-09-23 09:09

Cool! Keep an eye on the statistics in your Webmaster Tools account; that should indicate once pages from the sitemap are being indexed...

Cheers,
David.

Submitted by DVDAFFAIRES on Mon, 2013-02-11 07:47

Hi David.

1 - I have included in my version 12/10B this file as I reach the 16 million products.
It works badly because when I compare the result with the file "sitemap.php", urls are not the same.

With caching, special characters pass in the url.
For example: ! \ . / ° % ...
The " - " (with spaces) are replaced by "---".
Would you have a change to clean the url caching?

2-In version 12/10B, where a "/", for example, "green/blue", the "/" is not replaced by a "-" in the url, but erased.
This gives "greenbleue."
Can you tell me what file needs to be changed to add the character to be replaced by a "-" ?

Thank you in advance: o)

Cheers,
Raoul

Submitted by support on Mon, 2013-02-11 08:25

Hello Raoul,

I've modified the code in the original post to use the tapestry_productHREF() function so that product URLs will match exactly those being generated by your site...

Cheers,
David.
--
PriceTapestry.com

Submitted by DVDAFFAIRES on Mon, 2013-02-11 08:42

Hi David.
Excellent, everything is like now.

a big THANK YOU :)

Cheers,
Raoul

Submitted by DVDAFFAIRES on Mon, 2013-02-11 08:59

Sorry, I have a problem with ...

--- Log error
PHP Warning: number_format() expects parameter 2 to be long, string given in

--- autositemap.php
$resultat = number_format(ceil($resultat1/$limit),'');

Raoul

Submitted by support on Mon, 2013-02-11 09:11

Hi Raoul,

That should be:

$resultat = number_format(ceil($resultat1/$limit),0);

Cheers,
David.
--
PriceTapestry.com

Submitted by DVDAFFAIRES on Mon, 2013-02-11 09:42

There rest are still problems.
I changed:
$limit = 5000; >>> $limit = 50000;

system('gzip -f sitemap'.$i.'.xml'); >>> system('gzip -f sitemap/sitemap'.$i.'.xml');

La, the files are compressed, but it lacks the url domain:

<url>
<loc>/Storm-Bag.html</loc>
<lastmod>2013-02-11</lastmod>
<changefreq>weekly</changefreq>
</url>

Cheers,
Raoul

Submitted by support on Mon, 2013-02-11 10:28

Hello Raoul,

I'll review the whole script later as it was originally posted by another user - it looks like they use a $config_baseHREF2 variable, so you can insert this, and some code to make sure that it is set for both instances as follows - where you have:

  require("includes/common.php");

...REPLACE with:

Hope this helps!

  require("includes/common.php");
  $config_baseHREF2 = "http://www.example.com"; // no trailing forward slash!
  $sitemapBaseHREF = $config_baseHREF2;

...replace exmaple.com with your domain name and that should do the trick...

Cheers,
David.
--
PriceTapestry.com

Submitted by DVDAFFAIRES on Mon, 2013-02-11 11:14

Hi David,

It works like this.
I'll put the variable in the config.php to be clean.

Cheers,
Raoul

Submitted by support on Mon, 2013-02-11 11:19

Great! Glad you're up and running.

Cheers,
David.
--
PriceTapestry.com