You are here:  » Regular Expression Mapping - Multiple Word Matches


Regular Expression Mapping - Multiple Word Matches

Submitted by ChrisNBC on Mon, 2015-07-20 15:51 in

Hi David,

Hope all is going well.

I'm trying to add some more matching rules into the regular expression product matching but am stuck!...Please could you tell me what the syntax should be to match two words (in order) within a sentence.

Thanks in advance.

Regards
Chris

Submitted by support on Mon, 2015-07-20 16:13

Hi Chris,

If the two words are consecutive, then use \b (word boundary) flags around the two words e.g.

\bBlue Widget\b

However if you only require that the two words exist but there can be anything (or nothing other than space) in between then use:

\bBlue(.+)Widget\b

Cheers,
David.
--
PriceTapestry.com

Submitted by ChrisNBC on Mon, 2015-07-20 16:36

Hi David,

Thanks for the quick response. I just tried the second version but it did not work. I double checked the word exist and they do. The Product Name field is set to $0. Would be grateful for any suggestions you might have on how I could resolve this.

Thanks in advance.

Regards
Chris

Submitted by support on Mon, 2015-07-20 16:52

Hi Chris,

Have a go with the full expression value of:

/\b(Blue).+(Widget)\b/U

Here "/" is used as the expression delimiter. The "U" flag on the end makes the expression "U"ngreedy, so the smallest matching expression will be used. In the reconstruction, $0 always matches the full text matched (including the wildcard component in the middle) with $1, $2 etc. returning the n'th bracketed sub-expression, in this case "Blue" and "Widget", so using a Product Name of

$1 $2

...will return "Blue Widget" for anything containing Blue and Widget in that order...

Cheers,
David.
--
PriceTapestry.com

Submitted by ChrisNBC on Tue, 2015-07-21 08:49

Hi David,

Thanks for that. I realise the problem is that the regexp can't handle the space in the product names. I wondered if you could possibly tell me if there is there is a way to write the regexp to ignore spaces?

Thanks in advance.

Regards
Chris

Submitted by support on Tue, 2015-07-21 09:13

Hi Chris,

Could you perhaps give a couple of example merchant product names containing "Blue" and "Widget" in differing positions and with surrounding and in-between words etc. representative of the data you're working with and would like to map all to "Blue Widget" and I'll check it out for you...

Thanks!

David.
--
PriceTapestry.com

Submitted by ChrisNBC on Wed, 2016-06-15 14:42

Hi David,

I'm experimenting with regex to improve matching again and wondered if you may be able to help. I'm using the regex below:

/\b(Design).+(Green)\b/U

The above matches Design Green but not when the words are lower case. I have tried the \i modifier on the end but it doesn't seem to work and wondered if you could tell me where I should be placing it?

Also, I wondered if you could possibly suggest a way to expand the regex above to find the words in the reverse order so “Green Design” .

Thanks in advance.

Best Regards
Chris

Submitted by support on Wed, 2016-06-15 14:59

Hi Chris,

Your regex already includes the U flag, so "i" just needs to be added to the end without any prefix as the delimiters are already in place. To match the same in reverse order, you can use another bracketed expression with the two versions separated by pipe - have a go with:

/(\b(Design).+(Green)\b|\b(Green).+(Design)\b)/Ui

Cheers,
David.
--
PriceTapestry.com

Submitted by ChrisNBC on Wed, 2016-06-15 15:21

Hi David,

Thanks as always for the super quick reply.

The above works perfectly!...I just can't believe the number of combinations I tried except for any without a prefix!!

Best regards
Chris

Submitted by ChrisNBC on Fri, 2016-06-17 11:33

Hi David,

Please could you tell me if the RegEx mapping is applied before or after all the filters are applied.

Additionally, I wondered if you could suggest if there is a simple way to extend the RegEx mapping to scan product name and also the description field?

Thanks in advance.

Regards
Chris

Submitted by support on Fri, 2016-06-17 13:37

Hi Chris,

Product Mapping by RegExp is applied after filters. If you want to change the sequence of any import time processing that's no problem, simply by swapping around the various sections of code within the admin__importRecordHandler() function in includes/admin.php - let me know of course if you want to change the order at all and you're not sure what changes to make.

To have the application of product mappings by RegExp consider the description also, edit includes/admin.php and look for the following code at line 327:

        preg_match($v["regexp"],$importRecord["name"],$matches);

..and REPLACE with:

        preg_match($v["regexp"],$importRecord["name"]." ".$importRecord["description"],$matches);

And for the same when using the "Test" function on the configuration page for a Product Mapping by RegEx, edit admin/productsmap_regexp_configure.php and look for the following code at line 121:

  $wheres[] = "NAME LIKE '%".database_safe($word)."%' ";

...and REPLACE with:

  $wheres[] = "CONCAT(name,' ',description) LIKE '%".database_safe($word)."%' ";

Then look for the following code at line 126:

  $sql = "SELECT name,original_name,merchant,price,category,brand FROM `".$config_databaseTablePrefix."products` WHERE ".$where;

...and REPLACE with:

  $sql = "SELECT name,description,original_name,merchant,price,category,brand FROM `".$config_databaseTablePrefix."products` WHERE ".$where;

And finally the following code at line 147:

  preg_match($productmap["regexp"],$importRecord["name"],$matches);

...and REPLACE with:

  preg_match($productmap["regexp"],$importRecord["name"]." ".$importRecord["description"],$matches);

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by ChrisNBC on Fri, 2016-06-17 14:05

Hi David,

Thanks for the above. Before I give it a try, I think there may be a line of replacement code missing for the solution as you say to look for the line of code below but then no replacement is given?

$wheres[] = "CONCAT(name,' ',description) LIKE '%".database_safe($word)."%' ";

Also, you mention the file includes/productsmap_regexp_configure.php. Did you mean to write admin/ not includes/ as I can only see that file in the admin/ folder?

Thanks in advance.

Regards
Chris

Submitted by support on Fri, 2016-06-17 14:11

Ah Sorry, Chris - that was the replacement...! All corrected above (and filenaming)

Cheers,
David.
--
PriceTapestry.com

Submitted by ChrisNBC on Wed, 2016-06-22 10:36

Hi David,

No problem.... the above is working beautifully now. I wondered if you might be able to suggest if there is a way to expand the ‘Trigger’ fields to include other database fields?

Thanks in advance.

Regards
Chris

Submitted by support on Wed, 2016-06-22 10:57

Hi Chris,

You can add new trigger conditions for other fields by duplicating the code as required. For example, to add a trigger for a custom field "keywords", first run the following dbmod.php script to create the new field on the productsmap_regexp table;

<?php
  
require("includes/common.php");
  
$sql "ALTER TABLE `".$config_databaseTablePrefix."productsmap_regexp`
            ADD `trigger_keywords` VARCHAR(255) NOT NULL"
;
  
database_queryModify($sql,$result);
  print 
"Done.";
?>

Next, edit includes/admin.php and look for the following code at line 320:

  preg_match(($v["trigger_brand"]?$v["trigger_brand"]:"/.*/"),$importRecord["brand"])

...and REPLACE with:

  preg_match(($v["trigger_brand"]?$v["trigger_brand"]:"/.*/"),$importRecord["brand"])
  &
  preg_match(($v["trigger_keywords"]?$v["trigger_keywords"]:"/.*/"),$importRecord["keywords"])

Then edit admin/productsmap_regexp_configure.php and look for the following code at line 33:

  widget_validate("trigger_brand",FALSE,"regexp");

...and REPLACE with:

  widget_validate("trigger_brand",FALSE,"regexp");
  widget_validate("trigger_keywords",FALSE,"regexp");

And then the following code at line 49:

  `trigger_brand` = '".database_safe($_POST["trigger_brand"])."',

...and REPLACE with:

  `trigger_brand` = '".database_safe($_POST["trigger_brand"])."',
  `trigger_keywords` = '".database_safe($_POST["trigger_keywords"])."',

And then the following code at line 142:

  preg_match(($productmap["trigger_brand"]?$productmap["trigger_brand"]:"/.*/"),$importRecord["brand"])

...and REPLACE with:

  preg_match(($productmap["trigger_brand"]?$productmap["trigger_brand"]:"/.*/"),$importRecord["brand"])
  &
  preg_match(($productmap["trigger_keywords"]?$productmap["trigger_keywords"]:"/.*/"),$importRecord["keywords"])

And finally the following code at line 260:

  widget_textBox("Brand","trigger_brand",FALSE,$productmap["trigger_brand"],"RegExp",12);

...and REPLACE with:

  widget_textBox("Brand","trigger_brand",FALSE,$productmap["trigger_brand"],"RegExp",12);
  widget_textBox("Keywords","trigger_keywords",FALSE,$productmap["trigger_keywords"],"RegExp",12);

Hope this helps!

Cheers,
David.
--
PriceTapestry.com