Hello David,
by default config_advanced.php uses
$config_normaliseRegExp = "A-Za-z0-9".chr(0x80)."-".chr(0xFF)." \.";
what affects 'normalised_name' SQL field and further URL generation.
Is it possible to exclude some special chars what I suppose exists in the range of chr(0x80)
As per https://www.sciencebuddies.org/science-fair-projects/references/ascii-table#asciitable
I'd like to exclude, for example, such signs like
Decimal 33 - ! - exclamation mark
Decimal 35 - # - number sign
Decimal 44 - , - comma
Decimal 47 - / - forward slash
Decimal 59 - ; - semi-colon
Could you give an example how to modify default $config_normaliseRegExp (above) to exclude such a signs ?
It could be useful for general knowledge of PT settings,
Thanks in advance !
Have a nice weekend !
Best regards,
Hello David,
...hm...most of all I'm wrong somewhere...but let me understand
Two moments
1) OK, if : (colon) should be excluded by default using default $config_normaliseRegExp - why an URL has generated like
/1DNest%3A-Single-PC-License-51DNest%3A-Linear-Cutting-Optimizer-166574.html
where %3A is exactly the : (colon) ?
2) I'm now writing the General Rule to exclude all special signs from Product Name, is that below correct for Search/Replace Regex for backslash ?
/(!|#|,|\/|;|\|)/
or backslash also must be escaped like
/(!|#|,|\/|;|\\|)/
?
Hi,
If ":" isn't specifically allowed by $config_normaliseRegExp check whether a modification has been applied permitting the character through the $allow parameter of tapestry_normalise when normalised_name is created during the import process... In includes/admin.php, normalised_name is generated by the following code:
$normalisedName = tapestry_normalise($importRecord["name"]);
(line 462 in the latest distribution - search for the same code in earlier distributions it will be therebouts).
If the above was modified, for example using:
$normalisedName = tapestry_normalise($importRecord["name"],":");
...that would explain ":" appearing in product page URLs...
Regarding using backslash in the Regex, the second version is correct - as it is the escape character it must also be escaped - so:
/(!|#|,|\/|;|\\|)/
Cheers,
David.
--
PriceTapestry.com
YES, I've got it :) thank you for the hint !
Last detail - are there any other special signs what should be escaped by backslash in Regex ?
I suppose it should be for forward slash /, back slash \ and vertical bar itself |
/(!|#|,|\/|;|\\|\||)/
Hi,
Any of the metacharacters plus the delimiter, there's a list on the following page which is a great cribsheet for RegExp construction...
http://jkorpela.fi/perl/regexp.html
Cheers,
David.
--
PriceTapestry.com
Oops...making some tests with Product Name I met the situation - if Product Name contains
& "ampersand"
or
< "less than"
import just drop such a record...
Search Replace Regex
/(&|,|<|;)/
does not help...
Have you any hint for such the case ?
Hi,
It sounds like those characters are actually HTML entity encoded in your feed, so removing & and semi-colon will leave just the text part of the encoded entity (e.g. "amp" for what should be &).
If that's the case, one thing you can do is add an HTML Entity Decode filter - the code is in this comment and that should do the trick...
Cheers,
David.
--
PriceTapestry.com
Hello Serge,
By default the characters you list will be excluded so I may have mis-understood your question - only alpha-numerics are permitted from the 0 <> 127 ASCII range; all values above that are permitted as these characters form part of the UTF-8 character set, required for non-ASCII type e.g. accented vowels in some languages.
If you wanted to remove those characters from the displayed name then you could add a Search and Replace RegExp as a Global Filter to the Product Name field, using as the Search expression:
/(!|#|,|\/|;)/
...and Replace with nothing...
(in the pipe separated list of values to match / needs to be "escaped" by preceding with \ as it is used as the delimiter character)
Hope this helps!
Cheers,
David.
--
PriceTapestry.com