Products showing twice
I'm getting the same products showing twice when I search by all merchant.
I'm assuning there's either a problem in their feed, or there's some difference in the product not covered by the fields in pricetapestry, however, haw can you get the search to just show a single product in the search result?
example:
http://www.offerpages.co.uk/compare-prices/search.php?q=AEG+34942GM
If the name and the link are both the same I think they should be treated as a dupe and not shown. Not sure what anyone else thinks?
If the link and name are the same then yes it should be treated as a duplicate
part of the probblem i have seen with some of the feeds is that they have exactly the same name, and description, with different links, as there is a minor difference in the product that is not shown in the feed
I fully agree that if the title, description and image are exactly the same, it should be treated as a duplicate. There is no benefit of having visitors wondering why the "same" product is shown several times.
I have this problem a lot with feeds from fashion companies as the same product is available in different colours and/or different sizes.
Thanks for your comments. I'm going to look at the best way of coding this up. It's not as straight forward as it sounds becuase of the volume of data involved Vs. import speed Vs resources on the server (memory etc.)
I'm doing some experiments with a unique key on the product table created against the fields required as unique per merchant; but that starts to create problems with key length (max 500 chars).
I'm on the case...
I look forward to seeing the official solution to it. Until then I've fixed it by changing the sql for searches to this:
$sql = "SELECT * FROM `".$config_databaseTablePrefix."products` WHERE description RLIKE '".database_safe(tapestry_search($parts[0]))."' OR search_name RLIKE '".database_safe(tapestry_search($parts[0]))."' GROUP BY price,search_name";it adds a GROUP filter to it which deletes entries with identical name AND prices from the search results - the full results are still visible in the product detail page. Duplicates in the product page are fine by me because of the sort of product I am advertising, and the only mod I have made to that page is to display the description rather than the name in the price table. it also expands the search to check the description table as well, and uses RLIKE instead of a %LIKE% search.
I have also changed the search to sort by price rather than by name with the very simple change:
$sql .= " ORDER BY price";
and in my presentation of the page I've put from £price rather than just £price to explain that there might be more results.
Overall this is simplifying my search pages a lot - I do a lot of processing to sort the products into categories prior to display, and before I implemented this the pages were very messy. When it's done I'll post the URL - until then, dmorrison you already know the address - feel free to take a look.
HI the above removes some duplicates, although it does slow things down a little
Take a look at:
http://shoppingchanneluk.com/comp/search.php?q=Altered+Beast
based on the title i would say the 3 should be combined as 1 item, then the From price being £22.99
but that is where things get complicated, as there is no way of knowing the last item is different due to the poor title provided by the merchant
Thanks for your search page SQL suggestions lowndsy; that is something I was going to look at after solving the duplicate products / merchant problem.
Anyway, I've come up with something that doesn't have too much impact on the import speed; and is effectively what I described above. However rather than create a unique key on the database over multiple fields, what I am doing is creating a hash value using PHP's md5() function of the fields against which duplicates should be filtered. This hash value is then stored in the database as "dupe_hash", and a UNIQUE index created against it. This has the effect of causing the INSERT query to fail for a duplicate product; solving the problem at the database layer rather than the application; meaning that there are no memory constraint issues.
This change is now in the current distribution.
Changes are to database.sql and includes/admin.php. As distributed, the code will create the duplicate hash value against the merchant name and product name only. If you wish to include other fields in the duplicate detection, simply uncomment the required lines within the admin__importRecordHandler() function (includes/admin.php). The code is as follows:
<?php
/* create dupe_hash value */
$dupe_key = $admin_importFeed["merchant"];
// uncomment any additional fields that you wish to filter duplicates on (description not recommended)
$dupe_key .= $record[$admin_importFeed["field_name"]];
// $dupe_key .= $record[$admin_importFeed["field_description"]];
// $dupe_key .= $record[$admin_importFeed["field_image_url"]];
// $dupe_key .= $record[$admin_importFeed["field_buy_url"]];
// $dupe_key .= $record[$admin_importFeed["field_price"]];
$dupe_hash = md5($dupe_key);
?>
Please note that to implement the change using the install script I am afraid that you will need to DROP your current database and run setup.php again to create the new table structure. However, the database changes could be implemented manually as follows by running mysql from the command line and executing the following queries:
DELETE FROM products;
UPDATE feeds SET imported=0,products=0;
ALTER TABLE products ADD dupe_hash VARCHAR( 32 ) NOT NULL;
CREATE UNIQUE INDEX dupe_filter ON products (dupe_hash);You will then need to import all feeds again...
Hi,
It does look like a problem with the feeds; in fact many feeds contain duplicate product names.
I've looked at writing the code to ignore duplicate product names during the import process; but what I found was that in many cases the first entry for a product is an invalid link producing a - "Sorry, this product is no longer available" type message. The second version might work fine.
Having said that, ignoring duplicates might be the best option. What do you think?