You are here:  » Product Mapping by RegExp


Product Mapping by RegExp

Product Mapping by RegExp (Regular Expression) enables you to create rules that match commonly used formats found within product names, typically unique to a particular manufacturer / brand that cover a range of products within the same category - for example televisions.

Whilst different merchants might use quite different product names overall, the model number within is normally consistent, and it it this consistency that Product Mapping by Regular Expression uses to map those products together.

Example

LG Televisions have model numbers in a standard format, e.g: "49UB850V" - 2 digits, followed by one of "LB", "UB", "EA" etc, followed by a string of letters and digits up to the next word boundary. The following are two merchant's own names for the same product:

49UB850V 49" 3D LED TV Smart TV Ultra HD
LG 49UB850V 49 Inch 4K Ultra HD 3D LED TV

The following regular expression:

/\b(\d+(LB|UB|EA)\w+)\b/U

...will match each of these alternatives (and many more besides), and part of the functionality of regular expression parsers, such as PHP's built-in regular expression parser as used by Price Tapestry return the matched expression along with any bracketed sub-expressions. The matched expression(s) can then be combined with any other text to create the required common product name, in this case:

LG 49UB850V

Anatomy of a Regular Expression

Regular Expressions support modifiers that enable you to control the behaviour of the parser. In order for the parser to separate the modifiers from the experssion, a delimiter character must be used. It can actually be any character you like, but it has become standard practice to use forward slash - "/" as the delimiter. The delimiter must appear before and after the expression, followed by any modifiers required:

/expression/modifiers

A commonly used modifier is "i", which makes the expression case insensitive:

/expression/i

Several characters have special meanings within the expression, but at the most basic level a string of alphanumeric characters will simply match that string - for example:

/Foo/i

...would match "Foo", "FOO" or "foo".

Escape sequences (characers preceeded by "\") enable matching certain conditions, of most relevance to Product Mapping by RegExp being the word-boundary match - "\b", so whilst the above example would match "BarFooBar", the following expression:

/\bFoo\b/i

..would only match "Foo" as a word on its own. The word-boundary escape sequence considers both white space, and the beginning or end of a string.

Also useful for matching manufacturer product numbers is "\d", which matches any digit, and is equivalent to to the bracketed sub-expression (0-9) which uses the range operator "-" to specify a character range.

The pipe character may be used within a bracketed sub-expression to match a range of values - for example:

/(Foo|Bar)/i

...would match "Foo" or "Bar" (and case insensitive by way of the "i" modifier).

An excellent crib sheet for construction of regular expressions can be found here:

http://jkorpela.fi/perl/regexp.html

Creating a new Product Mapping by RegExp

  1. Go to Data Management > Product Mapping by RegExp
  2. Enter a name to describe the group of products to be mapped. Note that this is not a product name as you wish to be imported into the database, since the mapping will apply to any number of products covered by the configuration. Click Add to continue to the configuration page for the new mapping.
  3. To improve import performance, Product Mapping by RegExp entries can be configured to apply only if one or more trigger conditions are met. Trigger conditions can be applied to the Merchant, Category or Brand field of the product record being imported, and must be entered as a regular expression. For example, to trigger (case insensitive) against only the brand "LG", enter the following in the Brand box within the Trigger section:
    /LG/i
  4. Under Settings, enter the RegExp required, including brackets surrounding the sub-expressions you wish to extract in order to construct the Product Name. For example, to match LG Televisions as above, enter the following:
    /\b(\d+(LB|UB|EA)\w+)\b/U
  5. In the Product Name field, enter the resulting product name required, using the placeholder $0 for the full match, or $1, $2 etc. for the n'th matched bracketed sub-expression of the RegExp. For example, to construct the full product name "LG [model number]", enter the following:
    LG $0
  6. Optionally, enter a custom Category and Brand field for the mapped products; which will override merchant provided values at import time.
  7. Click Test to view the results of the mapping if it were to be applied to the currently imported products.
  8. Click Save to store the new mapping.
  9. The new mapping will not take complete effect until all affected feeds have been imported.