You are here:  » Getting total products processed

Support Forum



Getting total products processed

Submitted by Chani on Mon, 2006-04-10 08:49 in

Hi,

Is there any way of getting the total number of products the parser attempted to process (for each feed)? It would mean you could then display how many products were invalid and therefore not added to the site.

Many thanks!
Chani

Submitted by support on Mon, 2006-04-10 12:00

Hi Chani,

Couple of quick questions first;

i) Do you have an easy way to modify the database structure?

ii) Do you have command line access (Telnet or SSH) to your server?

iii) Are you able to import large feeds via your web browser without any timeout problems etc.?

Depending upon your answers i'll figure out the easiest way to go about this...

Submitted by Chani on Mon, 2006-04-10 12:34

Hi,

'Yes' to i) and ii) (phpMyAdmin for one - have already modified 'feeds' table), iii) should be able to, but Firefox has been a bit peculiar of late!

Submitted by support on Mon, 2006-04-10 15:09

I'm afraid this was always going to take a fair bit of coding; but I think the best way to do this would be to implement an "errors" column in the feeds table. Make it of type INT(11), default 0.

Then, in includes/admin.php, add a new global variable called $admin_importErrorCount whereever you have an instance of $admin_importProductCount, giving you the following code:

<?php
  
global $admin_importProductCount;
  global 
$admin_importErrorCount;
?>

There are 2 instances of this, one in admin__importRecordHandler() and the other in admin_import().

Next, in admin_import() set $admin_importErrorCount to zero at the same point in the code as $admin_importProductCount is initialised as follows:

<?php
  $admin_importProductCount 
0;
  
$admin_importErrorCount 0;
?>

Next, modify the SQL statement at the end of admin_import() from:

<?php
$sql 
"UPDATE `".$config_databaseTablePrefix."feeds` SET imported='".time()."',products='".$productCount."' WHERE filename='".database_safe($filename)."'";
?>

to

<?php
$sql 
"UPDATE `".$config_databaseTablePrefix."feeds` SET imported='".time()."',products='".$productCount."',errors='".$errorCount."' WHERE filename='".database_safe($filename)."'";
?>

Finally, within admin__importRecordHandler(), you need to change each instance of "return;", in the cases where a record is discarded, to increment the error counter before returning. These are at the very top where you have:

<?php
/* check product record for minimum required fields */
if (!$record[$admin_importFeed["field_name"]] || !$record[$admin_importFeed["field_buy_url"]] || !$record[$admin_importFeed["field_price"]]) return;
?>

..if you change that to:

<?php
/* check product record for minimum required fields */
if (!$record[$admin_importFeed["field_name"]] || !$record[$admin_importFeed["field_buy_url"]] || !$record[$admin_importFeed["field_price"]])
{
  
$admin_importErrorCount++;
  return;
}
?>

and lower down, after the user filters have been applied:

<?php
  
if ($filter_dropRecordFlag) return;
?>

change that to:

<?php
  
if ($filter_dropRecordFlag)
  {
    
$admin_importErrorCount++;
    return;
  }
?>

That will give you a new column in the feeds table which you can then display in admin/index.php. The quickest way to add it to the display would be to put the total number of products in brackets after the product count, as this will save having to add a new column to the table. To do this; change the following line:

<?php
print "<td align='right'>".($feeds[$filename]["products"] ? $feeds[$filename]["products"] : "&nbsp;")."</td>";
?>

to:

<?php
print "<td align='right'>".$feeds[$filename]["products"]." (".($feeds[$filename]["products"]+$feeds[$filename]["errors"]).")</td>";
?>

Hope this helps!

Submitted by Chani on Mon, 2006-04-10 18:16

Hi,

Brilliant, thanks :) I had a bit of a go at doing this yesterday, but just couldn't get it to come together. Good to know I was on the right track, just didn't know the script well enough!

For anyone else who wishes to use it, there was one thing above that didn't quite work, and that is the line which updates the feeds table has the wrong variable name for the errors field. It should be as follows:

<?php
$sql 
"UPDATE `".$config_databaseTablePrefix."feeds` SET imported='".time()."',products='".$productCount."',errors='".$admin_importErrorCount."' WHERE filename='".database_safe($filename)."'";
?>

Thanks again!
Chani

Submitted by vertygo on Fri, 2006-04-14 00:38

I'm not implementing this, but I'd just like to say that the help on this forum both from the author and the users is amazing. I really don't think I've seen this kind of thing before, I'm so happy I purchased this product.

Submitted by SteveC on Wed, 2007-03-21 20:32

Hi, trying to work out if anybody got this working or whether I'm just having a bad day! Im using some echo statements to try and work out whats gone wrong. The following is some code from the last section of the admin_import() function in includes/admin.php that has been amended as per this thread:

echo '<p> importProductCount : '."$admin_importProductCount".'</p>';
echo '<p> importErrorCount : '."$admin_importErrorCount".'</p>';
    $admin_importProductCount = 0;
    $admin_importErrorCount = 0; // MOD importErrorCount
    MagicParser_parse("../feeds/".$admin_importFeed["filename"],"admin__importRecordHandler",$admin_importFeed["format"]);
echo '<p> importProductCount : '."$admin_importProductCount".'</p>';
echo '<p> importErrorCount : '."$admin_importErrorCount".'</p>';
    $sql = "SELECT COUNT(*) AS productCount FROM `".$config_databaseTablePrefix."products` WHERE merchant='".database_safe($admin_importFeed["merchant"])."'";
    database_querySelect($sql,$rows);
    $productCount = $rows[0]["productCount"];
echo '<p> ProductCount : '."$productCount".'</p>';

and the result is:

mportProductCount :

importErrorCount :

importProductCount : 325

importErrorCount : 0

ProductCount : 28

It seems 325 might well be the total number of records in the feed? but its the latter , 28 which has acutally been added to the database and checks out when querying the database from the command line.
I cant work out why importErrorCount is not being updated - can anyone see what might be wrong here?

thanks, Steve.

Submitted by support on Thu, 2007-03-22 08:30

Hi Steve,

If the error count is zero the most likely cause is that it has not been declared as global within the admin__importRecordHandler() function in includes/admin.php. Make sure that you have included the following mod at the top of the function:

  global $admin_importErrorCount;

Hope this helps!
Cheers,
David.

Submitted by SteveC on Thu, 2007-03-22 10:10

Thanks david,
but this declaration is present - as per the instructions.

Ive noticed there are two places we increment $admin_importErrorCount within admin__importRecordHandler(); when any of required fields are not present and when the user has set a filter to drop specific records. Is it possible there are other ways to 'drop' records that arent being accounted for?

thanks, Steve.

Submitted by support on Thu, 2007-03-22 10:16

Hi Steve,

The only way records would not make it to the import record handler function is if the feed is significantly corrupt and the records aren't actually being parsed as records by the parser.

One test you could do is to create a drop record filter on a particular string that you know exists and see if this causes your error count to be incremented and set. This would at least prove that the code is all correct and updated the database properly...

Cheers,
David.

Submitted by SteveC on Thu, 2007-03-22 11:31

Thanks David,
I tested using drop record and also by assigning empty fields in the data feed to required fields in the registration form : both these work properly and by assigning empty fields to all three required registration fields I get the total number of records in the feed as an error figure in brackets as expected.

My problem then is to add any records dropped before they reach the record handler so I get a more accurate total records figure so that I know when there's a problem.

I'm not sure how best to do this but I noticed that $admin_importProductCount ( see my initial post containing code section ) seems to contain the number of records that should be eligible for import and this is then 'abandoned' and an sql query is used to count the actual number of products in the database - and I cant work out why.

Thanks, Steve.

Submitted by support on Thu, 2007-03-22 11:48

Hi Steve,

I'm not totally sure about this as it sounds like the product count is incrementing even if a record is not importing because of a duplicate check fail. If you look in includes/admin.php for the following code near the end of the import handler function:

    if (database_queryModify($sql,$insertId))
    {
      $admin_importProductCount++;
    }

...and change this as follows:

    if (database_queryModify($sql,$insertId))
    {
      $admin_importProductCount++;
    }
    else
    {
      $admin_importErrorCount++;
    }

This might make up the difference...

Hope this helps!
Cheers,
David.

Submitted by SteveC on Thu, 2007-03-22 12:39

David,
this change didn't seem to make any difference - so I presume there weren't any duplicates detected in the feed. I think I might erase this change later as I'm not really worried about duplicates - I'm more worried about records being dropped for any other reason. this is important because I want a clear indication of how successful 'work a rounds' for problem feeds have been. ie. how near I've got to getting all the products imported.

Is there a way to get the number of records dropped by the feed parser because it failed to parse it for whatever reason?

thanks, Steve.

Submitted by support on Thu, 2007-03-22 13:15

Hi Steve,

Unfortunately not i'm afraid because the parser only knows about record once it has found all the valid components that make up a record - so you don't know that you have an invalid record at any point in the code.

If you are catching any reason for existing the record handler without importing then i'm confident that you will be counting all records that were not imported for a known reason (either required fields missing or drop record flag set).

Having said that, I still don't quite understand why you are seeing a product count of several hundred, zero errors and only a few products imported. Was this an earlier issue or do these numbers add up now?

Cheers,
David.

Submitted by SteveC on Thu, 2007-03-22 14:04

David,
Its related to another issue. Ive had problems parsing the price from a complex string ( resulting in dropped records? ) and you suggested a suitable modification in the forum topic:

http://www.pricetapestry.com/node/948

Before I tried that I wanted to be able to get a before and after picture of how well this and any other mods/filters I apply have actually worked by showing the number of feeds I've actually managed to add to the database out of the total number in the feed ( not interested in duplicates ).

As an example, one of the feeds has 2875 records. This is currently showing as 1654(1654) - so I know its lying to me! and I want it to show 1654(2875) so that I can tell there's a 1000 records missing - (possibly due to decimalize function noted in the other forum topic?) We have now proved that user dropped records are counted as are any records that don't have the required fields so these aren't an issue.

using this feed and the echo statements as shown in the previous code snippet we get:

importProductCount :

importErrorCount :

importProductCount : 2875

importErrorCount : 0

ProductCount : 1654

I dont know enough about the code to say how $admin_importProductCount is produced but it does seem equal to the total number of records in the feed at this point ( although for some reason you have decided not to use it? ). However note that no errors have been counted and only 1654 records were added to the database - which of course explains why 1654(1654) is displayed on the admin page.

Sorry for the long explanation but I hope this explains it all fully.

Thanks, Steve.

Submitted by SteveC on Fri, 2007-03-23 13:24

Think I've solved this now.

All records were processed by magic parser ok. The errors were occurring in admin_importRecordHandler() when it tried to add the record to the database.

The confusing mismatch between $admin_importProductCount and the actual product count is caused by admin_importRecordHandler() incrementing $admin_importProductCount even when adding the record to the database fails.

So, without altering the meaning of the original code and to catch these errors as well replace the following snippet almost at the end of admin_importRecordHandler():

    if (database_queryModify($sql,$insertId))
    {
      $admin_importProductCount++;
    }

with:

    $result = database_queryModify($sql,$insertId);
    if ( $result ) {
      $admin_importProductCount++;
    }
    if ( $result < 1 ) {
      $admin_importErrorCount++;
    }

Steve.