You are here:  » PT Settings - exclude special signs from config_normaliseRegExp


PT Settings - exclude special signs from config_normaliseRegExp

Submitted by CashNexus on Sat, 2019-08-31 14:32 in

Hello David,
by default config_advanced.php uses
$config_normaliseRegExp = "A-Za-z0-9".chr(0x80)."-".chr(0xFF)." \.";
what affects 'normalised_name' SQL field and further URL generation.

Is it possible to exclude some special chars what I suppose exists in the range of chr(0x80)
As per https://www.sciencebuddies.org/science-fair-projects/references/ascii-table#asciitable
I'd like to exclude, for example, such signs like

Decimal 33 - ! - exclamation mark
Decimal 35 - # - number sign
Decimal 44 - , - comma
Decimal 47 - / - forward slash
Decimal 59 - ; - semi-colon

etc etc

Could you give an example how to modify default $config_normaliseRegExp (above) to exclude such a signs ?
It could be useful for general knowledge of PT settings,
Thanks in advance !
Have a nice weekend !
Best regards,

Submitted by support on Mon, 2019-09-02 11:14

Hello Serge,

By default the characters you list will be excluded so I may have mis-understood your question - only alpha-numerics are permitted from the 0 <> 127 ASCII range; all values above that are permitted as these characters form part of the UTF-8 character set, required for non-ASCII type e.g. accented vowels in some languages.

If you wanted to remove those characters from the displayed name then you could add a Search and Replace RegExp as a Global Filter to the Product Name field, using as the Search expression:

/(!|#|,|\/|;)/

...and Replace with nothing...

(in the pipe separated list of values to match / needs to be "escaped" by preceding with \ as it is used as the delimiter character)

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by CashNexus on Mon, 2019-09-02 12:43

Hello David,
...hm...most of all I'm wrong somewhere...but let me understand

Two moments
1) OK, if : (colon) should be excluded by default using default $config_normaliseRegExp - why an URL has generated like
/1DNest%3A-Single-PC-License-51DNest%3A-Linear-Cutting-Optimizer-166574.html
where %3A is exactly the : (colon) ?

2) I'm now writing the General Rule to exclude all special signs from Product Name, is that below correct for Search/Replace Regex for backslash ?
/(!|#|,|\/|;|\|)/
or backslash also must be escaped like
/(!|#|,|\/|;|\\|)/
?

Submitted by support on Mon, 2019-09-02 13:12

Hi,

If ":" isn't specifically allowed by $config_normaliseRegExp check whether a modification has been applied permitting the character through the $allow parameter of tapestry_normalise when normalised_name is created during the import process... In includes/admin.php, normalised_name is generated by the following code:

    $normalisedName = tapestry_normalise($importRecord["name"]);

(line 462 in the latest distribution - search for the same code in earlier distributions it will be therebouts).

If the above was modified, for example using:

    $normalisedName = tapestry_normalise($importRecord["name"],":");

...that would explain ":" appearing in product page URLs...

Regarding using backslash in the Regex, the second version is correct - as it is the escape character it must also be escaped - so:

 /(!|#|,|\/|;|\\|)/

Cheers,
David.
--
PriceTapestry.com

Submitted by CashNexus on Mon, 2019-09-02 13:50

YES, I've got it :) thank you for the hint !

Last detail - are there any other special signs what should be escaped by backslash in Regex ?
I suppose it should be for forward slash /, back slash \ and vertical bar itself |

/(!|#|,|\/|;|\\|\||)/

Something else ?

Submitted by support on Mon, 2019-09-02 13:58

Hi,

Any of the metacharacters plus the delimiter, there's a list on the following page which is a great cribsheet for RegExp construction...

http://jkorpela.fi/perl/regexp.html

Cheers,
David.
--
PriceTapestry.com

Submitted by CashNexus on Mon, 2019-09-02 17:50

Oops...making some tests with Product Name I met the situation - if Product Name contains
& "ampersand"
or
< "less than"
import just drop such a record...
Search Replace Regex
/(&|,|<|;)/
does not help...
Have you any hint for such the case ?

Submitted by support on Tue, 2019-09-03 08:04

Hi,

It sounds like those characters are actually HTML entity encoded in your feed, so removing & and semi-colon will leave just the text part of the encoded entity (e.g. "amp" for what should be &).

If that's the case, one thing you can do is add an HTML Entity Decode filter - the code is in this comment and that should do the trick...

Cheers,
David.
--
PriceTapestry.com