You are here:  » Difference in product count


Difference in product count

Submitted by IG on Sat, 2012-12-15 09:50 in

Hi David

According to my feed registration page total products are 407,698, whereas GWT states that the sitemap contains 452,871 products. Do you any idea (I am sure you do) where the difference is coming from?

Best regards,
IG

Submitted by support on Sat, 2012-12-15 11:24

Hello IG,

I would normally suggest that any difference is a delay between the last full import (at which time product count reported by /admin/ would be updated); and when Google last requested the full sitemap (i.e. each per-feed sitemap indexed by sitemap.php); however I see there is quite a significant difference in this case.

One possibility as you're running such a large installation is that there are orphaned products that remain in the products table possibly as a result of previous de-registrations not having completed. I am making a small change to the de-register process in an update to Price Tapestry to be released in January which is that the `feeds` record is not deleted until all products associated with the feed being de-registered have been deleted. As it stands, the `feeds` record is deleted first, and if the script then times-out deleting products, orphaned products can remain.

There is a simple query to purge the products table of any such product records - create a file purge.php as follows, and then browse to the script once to perform the operation. If the script times out by finishing without displaying "Done.", simply repeat execution until it does complete.

purge.php

<?php
  set_time_limit
(0);
  require(
"includes/common.php");
  
$sql "DELETE FROM `".$config_databaseTablePrefix."products`
            WHERE filename NOT IN
              (SELECT filename FROM `"
.$config_databaseTablePrefix."feeds`)";
  
database_queryModify($sql,$result);
  print 
"Done.";
?>

Cheers,
David.
--
PriceTapestry.com

Submitted by IG on Sun, 2012-12-16 22:03

Hi David

The two figures in my first posting are both from December 14. I am updating my site every night and GWT states that the sitemap was processed on December 14.

My pt_products table has 407,698 products and after purging following your instructions above the number has not changed. Unfortunately, the question remains: why does GWT state that my sitemap has 452,871 records when there are only 407,698 products in the database?

Best regards,
Ivo

Submitted by support on Mon, 2012-12-17 10:07

Hi Ivo,

That's interesting; I can understand GWT reporting fewer URLs since the individual per-feed sitemaps don't actually take compared products into account; however I can't immediately think of a reason why there should be more reported.

Please could your post your sitemap URL (I'll remove the link before publishing your reply) and I'll fetch it myself and run a script to count the URL elements and see if I can spot what's going on...

Thanks,
David.

Submitted by IG on Fri, 2013-01-11 14:46

Hi David

I am not sure if you haven't found the time to run a script to count the elements or if I have forgotten to post my sitemap URL. I thought I did, but in any case, I think I have made some progress in understanding why there is a different product count.

I have a csv feed from Zalando with about 102,000 products. However, GWT says the feed contains 150,000 products. GWT shows under "Sitemaps" the following for this particular feed:

/produkte?pto_sitemap=xml&pto_filename=zalando.csv&start=100000 submitted: 50,000
/produkte?pto_sitemap=xml&pto_filename=zalando.csv&start=50000 submitted: 50,000
/produkte?pto_sitemap=xml&pto_filename=zalando.csv submitted: 50,000

This explains why my PT admin shows 48,000 products less than GWT, but I wonder if this a bug of GWT or if there is a bug in my sitemap?

Cheers, IG

Submitted by support on Fri, 2013-01-11 15:01

Hello IG,

My apologies, there is indeed a bug in the WordPress plugin file pto_sitemap.php. In that file, look for the following code at line 66:

  $sitemapHREF .= "&amp;start=".$start;

...and REPLACE with:

  $sitemapHREF .= "&amp;pto_start=".$start;

Corrected in the distribution.

Cheers,
David.
--
PriceTapestry.com