You are here:  » Only import existing products from a big feed


Only import existing products from a big feed

Submitted by Kees on Thu, 2017-09-21 11:06 in

Hey David,

I have a database with over 300,000 products (from about 30 feeds) and I have an external feed with 15 million products.

I'm using the "Product Mapping by EAN" script and I would like to know if there is a possibility to only import the existing products in my database by EAN from the big feed. I don't want to import all the 15 million products

Greetings and thanks for your time,
Kees

Submitted by support on Thu, 2017-09-21 12:02

Hello Kees,

The optimal way to achieve this would be to create a look-up table of existing EAN values in memory, and then use a new filter - "Drop Record If Not UID", applied against the EAN field of the big feed. The look-up table can be populated at the start of any import process from the `pt_products` table as part of the admin_importSetGlobals() function in includes/admin.php and so this would work whatever import / CRON method you are using (i.e. if importing into temporary table (zero down-time), the look-up table will still be created from the live pt_products table).

To give this a go, first edit includes/admin.php and look for the following code at line 561:

    global $admin_importFilters;

...and REPLACE with:

    global $admin_importFilters;
    global $filter_dropRecordIfNotUIDs;
    global $config_uidField;
    $filter_dropRecordIfNotUIDs = array();
    $sql = "SELECT DISTINCT(".$config_uidField.") FROM `".$config_databaseTablePrefix."products`";
    database_querySelect($sql,$rows);
    foreach($rows as $row)
    {
      $filter_dropRecordIfNotUIDs[] = $row[$config_uidField];
    }

And then to add the new filter, add the following code to includes/filter.php:

  /*************************************************/
  /* dropRecordIfNotUID */
  /*************************************************/
  $filter_names["dropRecordIfNotUID"] = "Drop Record If Not UID";
  function filter_dropRecordIfNotUIDConfigure($filter_data)
  {
    print "<p>There are no additional configuration parameters for this filter.</p>";
  }
  function filter_dropRecordIfNotUIDValidate($filter_data)
  {
  }
  function filter_dropRecordIfNotUIDExec($filter_data,$text)
  {
    global $filter_dropRecordFlag;
    global $filter_dropRecordIfNotUIDs;
    if($filter_dropRecordFlag)
    {
      return $text;
    }
    $filter_dropRecordFlag = !in_array($text,$filter_dropRecordIfNotUIDs);
    return $text;
  }

With that in place, add a new "Drop Record If Not UID" to the EAN field for the big feed and that should do the trick. Note that the above uses the $config_uidField field name from config.advanced.php added as part of Automatic Product Mapping by Unique ID and the memory requirement would be less than 5MB for 300,000 EAN13 values which should be no problem...

Cheers,
David.
--
PriceTapestry.com