You are here:  » Reducing manual effort in setting up Price Tapestry


Reducing manual effort in setting up Price Tapestry

Submitted by keymaster on Sun, 2006-07-30 20:06 in

Hi,

Many price comparison sites can grow to around 25 to 40 feed files, with 1,500 to 3,000 products each. That's around 40,000 - 120,000 product records.

It seems merchants often name their products slightly different enough from each other, that many filters on product name need to be defined to enable them to match.

Compound that with category mappings, and there is alot of manual labour in setting up a price comparison site.

I was wondering if there is a better way, and am posing this question to you with the observation that in many areas you often come up with ingenious and original solutions.

Might it be possible to use some algorithm to make a best "first guess" on the mapping of product (and possibly category names)?

Perhaps when the products are first imported, you could provide a configurable option to permit the script to "guess" the correct product name based on the other product names already in the database. So, the first feed would require manual filters for product name. But the second feed imported could scan the product names already in the database (which have gone through the filters already and are now in the standard form), and make a best "first guess" on which product name in the database most closely matches the product name of the feed record being imported. The import code could automatically map the incoming feed product name into that name without user intervention. This would be one less mapping the admin has to define.

Most of the time, I suspect you will have several possibilities, but one of them will certainly have the highest "weight". That would be the best mapping. The original product name would have to be preserved in case the "first guess" is wrong, so the admin could go in and manually change it, but there would be much less filters the admin would have to define.

I suspect 80%-90% of the time, if the heuristic weighting is good enough, it will not be necessary for manual re-mapping by the administrator. The administrator could glance at the mappings very quickly and only manually change what was incorrectly mapped.

That would cut the admin's work by a factor of 10.

What is missing is a heuristic which can decide: "this mapping has the highest probability of being correct".

A number of factors might be considered in weighting potential mappings:

- number of identical words in the product name
- same/similiar category names
- same/similiar prices
- relevance of the description field (if you want to get fancy)

etc.

So for example, if two potential mappings are possible, but one of them has two identical words, the same category name, and a simliar price, while the other mapping has only one identical word, a different category name, and a vastly different price - obviously the first mapping is a better choice. I have simplified things, but the illustration should be clear.

Do you understand what I am suggesting? Is it possible? Do you have any ideas ?

(I am quite happy to hear that I have completely missed the boat, and this is already possible in price tapestry - because otherwise I - and all your customers - have lots of setup work ahead).

Thanks very much and all the best.

Submitted by support on Mon, 2006-07-31 14:01

Hi there,

When I was developing Price Tapestry I did do some experiments to try and automatically match products with slightly different names; and it turns out to be far more difficult than you would first imagine.

It's one of these problem instances in which something is extremely simply for a human to do, yet almost impossible for a computer!

Note that the big price comparison sites all employ armies of people to do the product matching between merchants - and running a large scale site is a resource intensive operation.

One idea that I have on the back burner is a mechanism by which users could contribute and subscribe to a database of "same product names" as used by different merchants. Price Tapestry could then use this database to match products from different suppliers. I may look to incorporate this with the "Price Tapestry Network" which I am still looking to get off the ground once i've figured out how to handle the performance issues!

Cheers,
David.