Reducing manual effort in setting up Price Tapestry

Hi,

Many price comparison sites can grow to around 25 to 40 feed files, with 1,500 to 3,000 products each. That's around 40,000 - 120,000 product records.

It seems merchants often name their products slightly different enough from each other, that many filters on product name need to be defined to enable them to match.

Compound that with category mappings, and there is alot of manual labour in setting up a price comparison site.

I was wondering if there is a better way, and am posing this question to you with the observation that in many areas you often come up with ingenious and original solutions.

Might it be possible to use some algorithm to make a best "first guess" on the mapping of product (and possibly category names)?

Perhaps when the products are first imported, you could provide a configurable option to permit the script to "guess" the correct product name based on the other product names already in the database. So, the first feed would require manual filters for product name. But the second feed imported could scan the product names already in the database (which have gone through the filters already and are now in the standard form), and make a best "first guess" on which product name in the database most closely matches the product name of the feed record being imported. The import code could automatically map the incoming feed product name into that name without user intervention. This would be one less mapping the admin has to define.

Most of the time, I suspect you will have several possibilities, but one of them will certainly have the highest "weight". That would be the best mapping. The original product name would have to be preserved in case the "first guess" is wrong, so the admin could go in and manually change it, but there would be much less filters the admin would have to define.

I suspect 80%-90% of the time, if the heuristic weighting is good enough, it will not be necessary for manual re-mapping by the administrator. The administrator could glance at the mappings very quickly and only manually change what was incorrectly mapped.

That would cut the admin's work by a factor of 10.

What is missing is a heuristic which can decide: "this mapping has the highest probability of being correct".

A number of factors might be considered in weighting potential mappings:

- number of identical words in the product name
- same/similiar category names
- same/similiar prices
- relevance of the description field (if you want to get fancy)

etc.

So for example, if two potential mappings are possible, but one of them has two identical words, the same category name, and a simliar price, while the other mapping has only one identical word, a different category name, and a vastly different price - obviously the first mapping is a better choice. I have simplified things, but the illustration should be clear.

Do you understand what I am suggesting? Is it possible? Do you have any ideas ?

(I am quite happy to hear that I have completely missed the boat, and this is already possible in price tapestry - because otherwise I - and all your customers - have lots of setup work ahead).

Thanks very much and all the best.

Support Forum

Active Forum Topics