You are here:  » Products showing twice


Products showing twice

Submitted by sgpratley on Sun, 2006-02-05 18:58 in

I'm getting the same products showing twice when I search by all merchant.
I'm assuning there's either a problem in their feed, or there's some difference in the product not covered by the fields in pricetapestry, however, haw can you get the search to just show a single product in the search result?

example: {link saved}

Submitted by support on Mon, 2006-02-06 08:33

Hi,

It does look like a problem with the feeds; in fact many feeds contain duplicate product names.

I've looked at writing the code to ignore duplicate product names during the import process; but what I found was that in many cases the first entry for a product is an invalid link producing a - "Sorry, this product is no longer available" type message. The second version might work fine.

Having said that, ignoring duplicates might be the best option. What do you think?

Submitted by sgpratley on Mon, 2006-02-06 11:24

If the name and the link are both the same I think they should be treated as a dupe and not shown. Not sure what anyone else thinks?

Submitted by searley on Mon, 2006-02-06 15:08

If the link and name are the same then yes it should be treated as a duplicate

part of the probblem i have seen with some of the feeds is that they have exactly the same name, and description, with different links, as there is a minor difference in the product that is not shown in the feed

Submitted by IG on Mon, 2006-02-06 16:42

I fully agree that if the title, description and image are exactly the same, it should be treated as a duplicate. There is no benefit of having visitors wondering why the "same" product is shown several times.

I have this problem a lot with feeds from fashion companies as the same product is available in different colours and/or different sizes.

Submitted by support on Mon, 2006-02-06 18:40

Thanks for your comments. I'm going to look at the best way of coding this up. It's not as straight forward as it sounds becuase of the volume of data involved Vs. import speed Vs resources on the server (memory etc.)

I'm doing some experiments with a unique key on the product table created against the fields required as unique per merchant; but that starts to create problems with key length (max 500 chars).

I'm on the case...

Submitted by lowndsy on Tue, 2006-02-07 01:48

I look forward to seeing the official solution to it. Until then I've fixed it by changing the sql for searches to this:

        $sql = "SELECT * FROM `".$config_databaseTablePrefix."products` WHERE description RLIKE '".database_safe(tapestry_search($parts[0]))."' OR search_name RLIKE '".database_safe(tapestry_search($parts[0]))."' GROUP BY price,search_name";

it adds a GROUP filter to it which deletes entries with identical name AND prices from the search results - the full results are still visible in the product detail page. Duplicates in the product page are fine by me because of the sort of product I am advertising, and the only mod I have made to that page is to display the description rather than the name in the price table. it also expands the search to check the description table as well, and uses RLIKE instead of a %LIKE% search.

I have also changed the search to sort by price rather than by name with the very simple change:
$sql .= " ORDER BY price";
and in my presentation of the page I've put from £price rather than just £price to explain that there might be more results.

Overall this is simplifying my search pages a lot - I do a lot of processing to sort the products into categories prior to display, and before I implemented this the pages were very messy. When it's done I'll post the URL - until then, dmorrison you already know the address - feel free to take a look.

Submitted by searley on Tue, 2006-02-07 08:43

HI the above removes some duplicates, although it does slow things down a little

Take a look at:

http://shoppingchanneluk.com/comp/search.php?q=Altered+Beast

based on the title i would say the 3 should be combined as 1 item, then the From price being £22.99

but that is where things get complicated, as there is no way of knowing the last item is different due to the poor title provided by the merchant

Submitted by support on Tue, 2006-02-07 09:16

Thanks for your search page SQL suggestions lowndsy; that is something I was going to look at after solving the duplicate products / merchant problem.

Anyway, I've come up with something that doesn't have too much impact on the import speed; and is effectively what I described above. However rather than create a unique key on the database over multiple fields, what I am doing is creating a hash value using PHP's md5() function of the fields against which duplicates should be filtered. This hash value is then stored in the database as "dupe_hash", and a UNIQUE index created against it. This has the effect of causing the INSERT query to fail for a duplicate product; solving the problem at the database layer rather than the application; meaning that there are no memory constraint issues.

This change is now in the current distribution.

Changes are to database.sql and includes/admin.php. As distributed, the code will create the duplicate hash value against the merchant name and product name only. If you wish to include other fields in the duplicate detection, simply uncomment the required lines within the admin__importRecordHandler() function (includes/admin.php). The code is as follows:

<?php
/* create dupe_hash value */
$dupe_key $admin_importFeed["merchant"];
// uncomment any additional fields that you wish to filter duplicates on (description not recommended)
$dupe_key .= $record[$admin_importFeed["field_name"]];
// $dupe_key .= $record[$admin_importFeed["field_description"]];
// $dupe_key .= $record[$admin_importFeed["field_image_url"]];
// $dupe_key .= $record[$admin_importFeed["field_buy_url"]];
// $dupe_key .= $record[$admin_importFeed["field_price"]];
$dupe_hash md5($dupe_key);
?>

Please note that to implement the change using the install script I am afraid that you will need to DROP your current database and run setup.php again to create the new table structure. However, the database changes could be implemented manually as follows by running mysql from the command line and executing the following queries:

DELETE FROM products;
UPDATE feeds SET imported=0,products=0;
ALTER TABLE products ADD dupe_hash VARCHAR( 32 ) NOT NULL;
CREATE UNIQUE INDEX dupe_filter ON products (dupe_hash);

You will then need to import all feeds again...

Submitted by Birgir on Thu, 2013-05-09 15:45

Hi David,

I got similar problem as described here. The datafeed contains multiple versions of the same product, i.e. the products come in different colours. I've been trying to apply the fix you suggest above, but I'm unable to find the code in the includes/admin.php file (I'm using the latest release of Tapestry).

In my scenario the price is the same, the image is the same and the buy url is the same, but the product name varies though, basically includes the different colours available. What I would ideally achieve here is to remove the duplicates, but capture the number of colours the product is available in. Example: If the product has 4 different colours, I would like to drop the 3 additional records, but be able to identify in the db that this product is available in 4 colours (based on the 3 that were dropped + the one that is kept).

Can you help me?

Thanks,
Birgir

Submitted by support on Thu, 2013-05-09 16:29

Hello Birgir,

There's a few ways around this, with varying levels of manual intervention.

Would it be practical to create Product Mapping entries for each product for which there are multiple product variations? If not, would it be practical to manually compile a list of colours, which the script can then search for in the product name at import time and populate a `colour` field from that?

A third way would be if there is 100% consistency in the way the colour is included into the name, e.g. with a known prefix such as "-" for example:

Widget - Red
Widget - Blue

...or alternatively it is _always_ the first word, for example:

Red Widget
Blue Widget

If neither of the first 2 options are practical would this be the case at all?

Cheers.
David.
--
PriceTapestry.com

Submitted by Birgir on Fri, 2013-05-10 14:27

David,

It would be practical to compile a list of colours, it would only be 20 or 30 items long. Also, there is 100% consistency in the way the colour is included in the name, e.g. [product name - red]. The colour name is always after the product name with known prefix of "-".

Sample:
Briggs & Riley Baseline Luggage Executive Toiletry Kit - Olive
Marker Sport and Travel Bags All Mountain Backpack - Black

Cheers,
Birgir

Submitted by support on Fri, 2013-05-10 15:46

Hi Birgir,

The consistency makes things easier!

The first step would be to add "colours" as a new field to your site following the standard instructions in this thread.

Don't worry about actually mapping the field on Feed Registration Step 2 because the following code will construct this field automatically during import.

Firstly, in includes/admin.php, look for the following code around line 231:

  if ($filter_dropRecordFlag) return;

....and REPLACE with:

  if ($filter_dropRecordFlag) return;
  global $productColours;
  preg_match('/(.*) \- ([A-Z-a-z]*)$/',$importRecord["name"],$matches);
  if (count($matches)==3)
  {
    $importRecord["name"] = $matches[1]; // the name without the colour
    $productColours[$importRecord["name"]][] = $matches[2]; // the colour
  }

If a name matching the format "Product Name - Colour" is found, the above code will extract the colour and then set $importRecord["name"] to the "Product Name" part, and then add the colour to a separate array $productColours.

Finally, after import for any given feed, we can update the `colours` field with the list of colours for each product. To do this, in the same file look for the following code at line 252:

  return "";

(this is at the end of the admin_import() function which is called for each feed being imported)

...and REPLACE with:

  global $productColours;
  if (is_array($productColours))
  {
    foreach($productColours as $name => $colours)
    {
      $colours = implode(",",$colours);
      $sql = "UPDATE `".$config_databaseTablePrefix."products` SET colours='".database_safe($colours)."' WHERE name='".database_safe($name)."' AND filename='".database_safe($filename)."'";
    }
    unset($productColours);
  }
  return "";

Then, you can display $mainProduct["colours"] (html/product.php) or $product["colours"] (all other html/* files where a product is in context) as described in the custom fields thread - let me know of course if you're not sure how to go about displaying the field exactly where you want it...

Cheers,
David.
--
PriceTapestry.com

Submitted by Birgir on Fri, 2013-05-10 18:23

Thanks David,

I will apply this solution to my setup tomorrow and come back to you.

Cheers,
Birgir

Submitted by Birgir on Fri, 2013-06-14 16:30

David,

Is it possible to make this mod feed specific?

Cheers,
Birgir

Submitted by support on Fri, 2013-06-14 16:58

Hello Birgir,

Yes - I should have included code in the above to reset the $productColours array - that should be all it is.

I just looked back at the last includes/admin.php that I forwarded to you containing this modification, so in that file, if you look for the following code at line 610:

  $admin_importProductCount = 0;

...and REPLACE with:

  global $productColours;
  unset($productColours);
  $admin_importProductCount = 0;

...that will ensure that it only applies on a per-feed basis!

Cheers,
David.
--
PriceTapestry.com

Submitted by Birgir on Mon, 2013-06-17 14:56

David,

I implemented this change to the includes/admin.php, but I'm not sure how I can make it feed specific.

I got 3 different product feeds, one that has the colour within the product name, i.e. product name - colour, and then another that has further product description within the name, i.e. product name - additional details. Those additional details are added to the colour field (as specified), but ideally I would like to identify which feed should extract the colour from the name and which one not.

Hope this makes sense and is not too complex to implement. I can of course ignore the colour all together and import the product names as is. What do you reckon?

Cheers,
Birgir

Submitted by support on Mon, 2013-06-17 15:48

Hello Birgir,

Where you made the following modification to includes/admin.php:

  global $productColours;
  preg_match('/(.*) \- ([A-Z-a-z]*)$/',$importRecord["name"],$matches);
  if (count($matches)==3)
  {
    $importRecord["name"] = $matches[1]; // the name without the colour
    $productColours[$importRecord["name"]][] = $matches[2]; // the colour
  }

...if you contained that within an IF condition based on the list of feed filenames that you want this logic to apply to, you could do this quite easily as follows:

  $useFeeds = array("filename1","filename2","filename3");
  if (in_array($admin_importFeed["filename"],$useFeeds))
  {
    global $productColours;
    preg_match('/(.*) \- ([A-Z-a-z]*)$/',$importRecord["name"],$matches);
    if (count($matches)==3)
    {
      $importRecord["name"] = $matches[1]; // the name without the colour
      $productColours[$importRecord["name"]][] = $matches[2]; // the colour
    }
  }

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by Birgir on Mon, 2013-06-17 17:27

David,

This is absolutely perfect.

Thank you for all your help.

Cheers,
Birgir