(Edit: updated for 12/10B)
I'm using the sitemap code from a previous forum post.
it makes the sitemaps, but the url's are wrong.
This is the code:
<? // #!/usr/bin/php
/** make a directory "sitemap" under you site path and save this script under sitemap.php **/
/** will create files like this :
sitemap1.xml.gz
sitemap2.xml.gz
sitemap3.xml.gz
...
sitemap350.xml.gz
**/
require("includes/common.php");
$limit = 5000;
$start = 0;
$xml2 = "<?xml version='1.0' encoding='UTF-8'?>";
$xml2 .= '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd">';
$sql = "SELECT count(DISTINCT(name)) as resultc FROM `".$config_databaseTablePrefix."products`";
database_querySelect($sql,$rows);
$resultat1 = $rows[0]["resultc"];
$resultat = number_format(ceil($resultat1/$limit),'');
for($i=1;$i < $resultat+1; $i++)
{
$xml = "<?xml version='1.0' encoding='UTF-8'?>";
$xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
$sql = "SELECT DISTINCT(name) FROM `".$config_databaseTablePrefix."products` LIMIT ".$start.",".$limit;
$sitemapBaseHREF = $config_baseHREF2;
if (database_querySelect($sql,$rows)) {
foreach($rows as $product) {
$product["normalised_name"] = tapestry_normalise($product["name"]);
$item = array();
$item["name"] = $product["name"];
$xml .= "\n<url>";
$xml .= "\n<loc>".$sitemapBaseHREF.tapestry_productHREF($product)."</loc>";
$xml .= "\n<lastmod>".date("Y-m-d")."</lastmod>";
$xml .= "\n<changefreq>weekly</changefreq>";
$xml .= "\n</url>";
}
}
$xml .= "\n</urlset>";
/** we save sitemap files in a the new directory "sitemap" **/
$fp = fopen("sitemap/sitemap".$i.".xml", 'w+');
fputs($fp, $xml);
fclose($fp);
system('gzip -f sitemap'.$i.'.xml');
$start = $start + $limit;
/** we add this file path to the indexmap in the root "yourpath" sitemap.xml **/
$xml2 .= "\n<sitemap>";
$xml2 .= "\n<loc>".$sitemapBaseHREF.$config_baseHREF."sitemap/sitemap".$i.".xml.gz</loc>";
$xml2 .= "\n<lastmod>".date("Y-m-d")."</lastmod>";
$xml2 .= "\n</sitemap>";
}
$xml2 .= "\n</sitemapindex>";
$fp = fopen("sitemap.xml", 'w+');
fputs($fp, $xml2);
fclose($fp);
/** finish **/
/** we can ping google, yahoo, msn **/
//system(wget -O RESU_GOOGLE "http://www.google.com/webmasters/tools/ping?sitemap=http%3A%2F%2Fwww.mydomain.co.uk%2Fsitemap.xml");
//system(wget -O RESU_YAHOO "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://www.mydomain.co.uk/sitemap.xml");
//system(wget -O RESU_MSN "http://webmaster.live.com/webmaster/ping.aspx?siteMap=http://www.domain.co.uk/sitemap.xml");
?>
This is the result in google sitemap:
http:///product/Wedgwood-Ceramic-Experience-for-Two.html
Where do I have to add the domain code?
Brilliant,
That works. But the sitemap indexing has not added any urls yet.
Cool! Keep an eye on the statistics in your Webmaster Tools account; that should indicate once pages from the sitemap are being indexed...
Cheers,
David.
Hi David.
1 - I have included in my version 12/10B this file as I reach the 16 million products.
It works badly because when I compare the result with the file "sitemap.php", urls are not the same.
With caching, special characters pass in the url.
For example: ! \ . / ° % ...
The " - " (with spaces) are replaced by "---".
Would you have a change to clean the url caching?
2-In version 12/10B, where a "/", for example, "green/blue", the "/" is not replaced by a "-" in the url, but erased.
This gives "greenbleue."
Can you tell me what file needs to be changed to add the character to be replaced by a "-" ?
Thank you in advance: o)
Cheers,
Raoul
Hello Raoul,
I've modified the code in the original post to use the tapestry_productHREF() function so that product URLs will match exactly those being generated by your site...
Cheers,
David.
--
PriceTapestry.com
Hi David.
Excellent, everything is like now.
a big THANK YOU :)
Cheers,
Raoul
Sorry, I have a problem with ...
--- Log error
PHP Warning: number_format() expects parameter 2 to be long, string given in
--- autositemap.php
$resultat = number_format(ceil($resultat1/$limit),'');
Raoul
Hi Raoul,
That should be:
$resultat = number_format(ceil($resultat1/$limit),0);
Cheers,
David.
--
PriceTapestry.com
There rest are still problems.
I changed:
$limit = 5000; >>> $limit = 50000;
system('gzip -f sitemap'.$i.'.xml'); >>> system('gzip -f sitemap/sitemap'.$i.'.xml');
La, the files are compressed, but it lacks the url domain:
<url>
<loc>/Storm-Bag.html</loc>
<lastmod>2013-02-11</lastmod>
<changefreq>weekly</changefreq>
</url>
Cheers,
Raoul
Hello Raoul,
I'll review the whole script later as it was originally posted by another user - it looks like they use a $config_baseHREF2 variable, so you can insert this, and some code to make sure that it is set for both instances as follows - where you have:
require("includes/common.php");
...REPLACE with:
Hope this helps!
require("includes/common.php");
$config_baseHREF2 = "http://www.example.com"; // no trailing forward slash!
$sitemapBaseHREF = $config_baseHREF2;
...replace exmaple.com with your domain name and that should do the trick...
Cheers,
David.
--
PriceTapestry.com
Hi David,
It works like this.
I'll put the variable in the config.php to be clean.
Cheers,
Raoul
Great! Glad you're up and running.
Cheers,
David.
--
PriceTapestry.com
Hi,
It looks like this is down to the $config_baseHREF2 variable; which it looks like is empty at the moment. If you add this to your config.php as follows:
$config_baseHREF2 = "http://www.mydomain.com";
Note that there is no trailing "/". This is because the sitemap code above combines this with $config_baseHREF, which contains the / plus any additional path if required (but not in the case of your site)...
That should be all it is!
Cheers,
David.