You are here:  » Mapping Product Names to UPC Dumbed Down


Mapping Product Names to UPC Dumbed Down

Submitted by Brice on Tue, 2018-01-16 19:23 in

1. I am finding a lot of information on Product Mapping & RegExp. My reason for hunting for a solution is because the feeds we receive sometimes have product titles that appear as you would expect for example: "RugRat Art" and other feeds will have the title as "Rug 8456 AR" the commonality between these is the Upc Code and MPN. When we search for the UPC in the system we get the products but its not recognizing them as the same product to compare against even though they technically are the same just different titles.

Learning as I go I believe I need to set a 'Product Mapping Reg Exp' but feeling lost on how to properly implement for all feeds coming in with a goal in the end of making the mismatched titles still correlate with the UPC and or the MPN during searches so the product can be compared. I believe all the information is under my nose but hoping you can dumb this down a little bit for me to actually grasp the solution.

2. IS it better to somehow MAP the titles so they actually match if the MPN and or the UPC match each other? For example, if a product has the same UPC and the same MPN can I make it so the title result on all matches equals the best name of "RugRat Art". My thought process to this is a lot of people will search by name over UPC or MPN. If the search by UPC it seems we can have success, if they search by "RugRat" they may not get the result that should include "Rug 8456 AR" But this may be one in the same as question 1 and I am making it more difficult then it is.

3. In relation with the 2 above questions, (if you want to separate this out into a topic on its own I dont blame you) So we get a large amount of feeds, we also have access to all the products coming in from these feeds as well. So I have access to the images, descriptions, upc, mpn and so on even though each feed may be from different suppliers. The feeds coming in can be a jumbled mess like the title being unrecognizable but the UPC will match and so will the MPN. Would it make more sense for us to somehow provide the pictures and titles rather then taking them from the feeds and linking the feeds up to our pictures and our titles by UPC and MPN...somehow?

I am sure you got the answer and appreciate your time once again.

Submitted by support on Wed, 2018-01-17 10:59

Hello Brice,

Regarding 1 & 2; before looking at manual mapping options I would consider using Automatic Product Mapping by Unique ID, and in your case using `upc` initially as the unique ID field, in other words by adding to config.advanced.php;

  $config_uidField = "upc";

I would try that as per the download and using the `upc` field in the first instance, and if you then wanted to consider `mpn` also I have a multiple ID field version of the script which you would be welcome to try - just let me know and I'll forward that by email.

Note that in order to ensure that the same title is chosen each time uidmap.php runs, the title used is the first alphabetically for that ID so this may or may not alleviate the need to maintain your own product name database per unique ID (which could become quite large). Finally regarding your own locally hosted images per unique ID, i'll follow up later today in another reply to this thread regarding how to achieve this...

Cheers,
David.
--
PriceTapestry.com

Submitted by support on Wed, 2018-01-17 12:20

Hi again,

Regarding 3; an easy way to manage local images per UPC would be to create a new folder "upcimages" containing the image files with the UPC as the base filename and ".jpg" as the extension in this example, so for UPC 1234567890 you would upload the image:

/upcimages/1234567890.jpg

And then in includes/admin.php, look for the following comment at line 448:

    /* check product record for minimum required fields */

...and REPLACE with:

    global $config_baseHREF;
    if ($importRecord["upc"])
    {
      if (file_exists("../upcimages/".$importRecord["upc"].".jpg"))
      {
        $importRecord["image_url"] = $config_baseHREF."upcimages/".$importRecord["upc"].".jpg";
      }
    }
    /* check product record for minimum required fields */

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Thu, 2018-01-18 16:24

Oddly that didnt work. The database shows the mod so that is good. I did a import of two different feeds. Ran the uidmap.php.

So both Merchant feeds have Product ALPHA with the same upc code.

Merchant one has a good title and Merchant two has a bad title name.

Searching for the upc code will result in only Merchant 2 results. Merchant one is only accessible by manually searching and filtering for Merchant 1 .

So there is no comparison of the two merchants still and it seems its going with the poor naming convention.

Submitted by Brice on Thu, 2018-01-18 16:45

The Merchant 1 feed may be an isolated incident. However, UPC code search on some other items still does not bring them into compare. Just treats them as separate products.

Submitted by Brice on Thu, 2018-01-18 17:42

In regards to this answer, would it be possible to make this so not only does it pull the proper image according to upc but it also pulls our description and title as well?

In essence, we would provide title, description and image. Merchant feeds would still contain their own product names but when a customer searches for a product via upc it would bring them to the product page with our static information and then list all the merchants regardless if their title is poorly done or not.

IF our "static" product information is not in the database then the system would do what it normally does. So if product "APPLE" and upc "123456789" is not in our static list, then when this product is imported on a feed it would display as it normally would. Basically, I guess I am saying can it check our static product database and match up and if not match continue as normal?

Just brainstorming in the event we need to head down this path.

Submitted by Brice on Thu, 2018-01-18 18:48

*update*

Went over the instructions for the 50th time and spotted something I did wrong. I corrected the issue and went through the steps again.

This time I got a "working" message followed by Phase 0 done and then Phase 1 done. Looked impressive but the result in the end when searching by a UPC code resulted in the same. It gave me search results of 2 separate products but they both have the same UPC. Clicking into either product gave nothing as far as a comparison.

Submitted by support on Thu, 2018-01-18 19:21

Hello Brice,

I was going to ask if you were able to run uidmap.php in such a way that you can view the output which will confirm the products identified and the number of products matched by each ID so I am wondering if the upc values are not exactly identical for example by leading zeros e.g;

Merchant 1
name: New Edition Blue Widget
upc: 0123456789

Merchant 2
name: Blue Widget (New Edition)
upc: 123456789

To check if this might be the case, make a temporary modification to html/prices.php to show the upc in the price comparison table below the merchant's product name (Catalogue Product Name). To do this, look for the following code at line 57:

            <td class='hide-for-small-only'><?php print $product["original_name"]; ?></td>

...and REPLACE with:

            <td class='hide-for-small-only'>
              <?php print $product["original_name"]; ?>
              <?php if ($product["upc"]) print "<br />UPC: [".$product["upc"]."]"?>
            </td>

Then for products that have not been matched by uidmap.php, is there a difference (even if subtle) in the UPC value shown in the price comparison table?

Thanks,
David.
--
PriceTapestry.com

Submitted by Brice on Thu, 2018-01-18 19:35

* another Update*

I ran through it a few more times and noticed during the Phase 1 processing there where some that had a (2) at the end of the name. I believed this to be the matching products out of 2 feeds. I found that particular products upc code and searched by that number and it successfully showed the two feeds as a comparison. Not completely sure what happended officially I can only say success! So my apologies for the many replies to get to this point but it appears to be successfully matching the UPC. (just read your new update and I believe you are onto the problem I ran into. One feed led differently then the other)

Its random as far as the title goes as you mentioned. Some titles are perfect and others are a mess because of the feeds.

So this leaves me to believe if I want a perfect world I would need to keep a database of proper names/images/descriptions correct?

Submitted by support on Fri, 2018-01-19 09:42

Hello Brice,

Considering the name firstly, would choosing the longest instead of first alphabetically help? If you wanted to try this, edit scripts/uidmap.php and look for the following code at line 38:

  $sql2 = "SELECT SQL_CALC_FOUND_ROWS name FROM `".$config_databaseTablePrefix."products` WHERE ".$config_uidField."='".database_safe($product[$config_uidField])."' ORDER BY name LIMIT 1";

...and REPLACE with:

  $sql2 = "SELECT SQL_CALC_FOUND_ROWS name FROM `".$config_databaseTablePrefix."products` WHERE ".$config_uidField."='".database_safe($product[$config_uidField])."' ORDER BY LENGTH(name) DESC LIMIT 1";

Similarly, would using the longest description for compared results help? To try this on the front end in the first instance, edit html/product.php and look for the following code at line 2:

  $product_main = $product["products"][0];

...and REPLACE with:

  $product_main = $product["products"][0];
  foreach($product["products"] as $p)
  {
    if (strlen($p["description"])>strlen($product_main["description"]))
    {
      $product_main["description"] = $p["description"];
    }
  }

Also, if you would like to post a couple of examples of what should be the same but are not exactly identical UPC values I will take a look and see if a cleansing filter (similar to the EAN Fix filter I posted in this comment) might be an option...

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Fri, 2018-01-19 16:02

Hi David,

No, going with the longest would not rectify the situation. Often the cleaner titles are shorter but thats maybe 90% of the time give or take.

As far as the upc values, it appears to be a leading 0 or ending 0. My mistake was testing the system and looking at the last 5 digits to confirm success. The reality was there was a leading zero to the upc.

For Example, 1 merchant had an incorrect upc:

123456789

but the actual upc should be:
1234567890

this example was more common.

I have tracked this down to a distributors feed where they have truncated the upc for some reason. I think if I go the route of having a database where I maintain and house the correct data it may make things prettier and easier but you may know more about that then I.

Submitted by Brice on Fri, 2018-01-19 16:11

And all the feeds we have been seeing so far have no description so that option of longest description is out as well.

The lack of description was another reason I thought that a separate master database was needed.

Submitted by support on Sat, 2018-01-20 09:45

Hello Brice,

I've helped users create master tables of product information previously but there's various approaches depending on the volume of data you are working with - can you give an approximate idea of the number of products that you need to manage names, description and image URLs for, and I'll point you in the right direction...

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Sat, 2018-01-20 15:17

I can give an estimate currently. Number of products would be around 50,000 I believe and thats with some cushion for added measure. In the near future that may go to 100,000. I do not anticipate a list much bigger then that.

Submitted by support on Mon, 2018-01-22 09:57

Hello Brice,

Consider the concept of an "authoritative" merchant from which to take the name, description and image_url per UPC (if exists).

To experiment with this, first add a new config parameter to config.advanced.php following the $config_uidField parameter:

  $config_uidMerchant = "Merchant Name";

...where Merchant Name is the merchant you wish to use for names, images and descriptions. Next, edit uidmap.php and look for the following code beginning at line 73:

      $normalisedName = tapestry_normalise($product["name"]);
      $searchName = tapestry_search($normalisedName);
      $sql2 = "UPDATE `".$config_databaseTablePrefix."products` SET name='".database_safe($product["name"])."',normalised_name='".database_safe($normalisedName)."',search_name='".database_safe($searchName)."' WHERE ".$config_uidField."='".database_safe($product["uid"])."'";
      mysqli_query($link2,$sql2);

...and REPLACE with:

      $sql2 = "SELECT name,description,image_URL FROM `".$config_databaseTablePrefix."products` WHERE ".$config_uidField."='".database_safe($product["uid"])."' AND merchant='".database_safe($config_uidMerchant)."'";
      if ($result2 = mysqli_query($link2,$sql2))
      {
        $product = mysqli_fetch_assoc($result2);
        $normalisedName = tapestry_normalise($product["name"]);
        $searchName = tapestry_search($normalisedName);
        $sql2 = "UPDATE `".$config_databaseTablePrefix."products` SET
          name='".database_safe($product["name"])."',
          normalised_name='".database_safe($normalisedName)."',
          search_name='".database_safe($searchName)."',
          description='".database_safe($product["description"])."',
          image_url='".database_safe($product["image_url"])."'
          WHERE ".$config_uidField."='".database_safe($product[$config_uidField])."'";
        mysqli_query($link2,$sql2);
      }
      else
      {
        $normalisedName = tapestry_normalise($product["name"]);
        $searchName = tapestry_search($normalisedName);
        $sql2 = "UPDATE `".$config_databaseTablePrefix."products` SET name='".database_safe($product["name"])."',normalised_name='".database_safe($normalisedName)."',search_name='".database_safe($searchName)."' WHERE ".$config_uidField."='".database_safe($product["uid"])."'";
        mysqli_query($link2,$sql2);
      }

With that in place, next time uidmap.php runs as part of your cronjob, Merchant Name will be the source of all names, description and image_url values where a product from that merchant exists with each UPC, otherwise the automatic mapping functionality would be as before.

Now, if you wanted to create a reference feed (dummy merchant) simply containing your preferred names, descriptions and image URLs that would be no problem - it can be deleted from the database at the end of the import all process. The master feed must still contain the required fields of Buy URL and price but these can be placeholder values, for example as reference.csv (saved in /feeds/ folder)

upc,name,description,image_url,buy_url,price
123456789,Product Name 1,Description 1,http://www.example.com/image1.jpg,http://localhost/,1.00
987654321,Product Name 2,Description 2,http://www.example.com/image2.jpg,http://localhost/,1.00

Register the feed as normal using merchant name "reference", and don't forget to change config.advanced.php as follows;

  $config_uidMerchant = "reference";

When cron.php runs, the reference feed / merchant will be imported as all other feeds and used to set the preferred names, descriptions and image URLs. Finally, edit uidmap.php and look for the following code beginning at (now) line 107:

  $sql1 = "TRUNCATE `".$config_databaseTablePrefix."uidfix`";
  $result1 = mysqli_query($link1,$sql1);

...and REPLACE with:

  $sql1 = "TRUNCATE `".$config_databaseTablePrefix."uidfix`";
  $result1 = mysqli_query($link1,$sql1);
  $sql1 = "DELETE FROM `".$config_databaseTablePrefix."products` WHERE merchant='".database_safe($config_uidMerchant)."'";
  $result1 = mysqli_query($link1,$sql1);

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Mon, 2018-01-22 16:20

For the first option, selecting one merchant would be assuming that one merchant has a feed that can fulfill all the requirements. The other issue with this is that feed may not contain all the products that the other feeds may contain that also need some adjustments to the products like title and description. So that one merchant may knock out an issue for 50% of other feeds but then we would still have the issue with the other 50%.

The other option of using a reference merchant I think is a good idea but wouldnt that merchant or 'reference' merchant be apart of the comparison process with all the feeds on the frontend?

Im assuming that its deleting the file from the system ($sql1 = "DELETE FROM `".$config_databaseTablePrefix."products` WHERE merchant='".database_safe($config_uidMerchant)."'") so that might remove it from the frontend?

Submitted by support on Mon, 2018-01-22 16:48

Your assumptions in both cases are correct, so with the second option if you add the DELETE FROM ... SQL, the reference merchant will never appear on your site as an actual merchant.

Cheers,
David.
--
PriceTapestry.com