Product Mapping by RegExp (Regular Expression) enables you to create rules that match commonly used formats found within product names, typically unique to a particular manufacturer / brand that cover a range of products within the same category - for example televisions.
Whilst different merchants might use quite different product names overall, the model number within is normally consistent, and it it this consistency that Product Mapping by Regular Expression uses to map those products together.
LG Televisions have model numbers in a standard format, e.g: "49UB850V" - 2 digits, followed by one of "LB", "UB", "EA" etc, followed by a string of letters and digits up to the next word boundary. The following are two merchant's own names for the same product:
49UB850V 49" 3D LED TV Smart TV Ultra HD
LG 49UB850V 49 Inch 4K Ultra HD 3D LED TV
The following regular expression:
...will match each of these alternatives (and many more besides), and part of the functionality of regular expression parsers, such as PHP's built-in regular expression parser as used by Price Tapestry return the matched expression along with any bracketed sub-expressions. The matched expression(s) can then be combined with any other text to create the required common product name, in this case:
Anatomy of a Regular Expression
Regular Expressions support modifiers that enable you to control the behaviour of the parser. In order for the parser to separate the modifiers from the experssion, a delimiter character must be used. It can actually be any character you like, but it has become standard practice to use forward slash - "/" as the delimiter. The delimiter must appear before and after the expression, followed by any modifiers required:
A commonly used modifier is "i", which makes the expression case insensitive:
Several characters have special meanings within the expression, but at the most basic level a string of alphanumeric characters will simply match that string - for example:
...would match "Foo", "FOO" or "foo".
Escape sequences (characers preceeded by "\") enable matching certain conditions, of most relevance to Product Mapping by RegExp being the word-boundary match - "\b", so whilst the above example would match "BarFooBar", the following expression:
..would only match "Foo" as a word on its own. The word-boundary escape sequence considers both white space, and the beginning or end of a string.
Also useful for matching manufacturer product numbers is "\d", which matches any digit, and is equivalent to to the bracketed sub-expression (0-9) which uses the range operator "-" to specify a character range.
The pipe character may be used within a bracketed sub-expression to match a range of values - for example:
...would match "Foo" or "Bar" (and case insensitive by way of the "i" modifier).
An excellent crib sheet for construction of regular expressions can be found here:
Creating a new Product Mapping by RegExp