You are here:  » Descriptions Without Punctuation


Descriptions Without Punctuation

Submitted by 2fer on Fri, 2015-07-10 19:09 in

Hi David,
A new feed I'm integrating with an existing shop has descriptions without punctuation. On my machine I can use GREP filters to replace any instance of a capital letter in a word with a . <code> (period and space) but there isn't a way without adding thousands of filters to find and replace all possible combinations of <code>([a-z])([A-Z]) in the descriptions to make readable sentences out of this feed's descriptions. To make it worse, the descriptions contain proprietary terms in the form of AbcdeFghij so I need to split the caps into two filters and then go back and undo one unwanted result. The feed updates daily so it isn't something I want to do offline and upload daily.

The only regex filters my version has don't do find/replace edits. Has this been addressed in a newer version? (Mine is 13a I think) Would you have a tweak for older versions to enable this? As always, thanks in advance for any suggestions.
2fer

Submitted by support on Sat, 2015-07-11 09:58

Hi 2fer,

There's a Search and Replace RegExp filter in this comment.

If you're not sure of any search expressions required to match your text just let me know a couple of examples and try and help out...

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by 2fer on Mon, 2015-07-13 17:14

Thank you, David,
You always come through with the tools we need! Now, if I want to replace "abcdeFghij" with "abcde. Fghij" but leave "klmNop" as is, do I have something close to correct syntax here?
Search: ([a-z])([A-M-O-Z])
Replace: \1[.] \2

RegExp is not the same as GREP so I have to ask.
Thanks!
2fer

Submitted by support on Tue, 2015-07-14 09:06

Hi,

Have a go with:

Search:

/\b([a-z]*)([A-Z]{1})([a-z]*)\b/

Replace:

$1. $2$3

In the search expression, \b is the word boundary anchor, and then the 3 character matching sections, ([a-z]*) for the lowercase last word of the sentence, ([A-Z]{1}) for single capital letter, and then ([a-z]*) for the remainder of the first word of the next sentence. The $n back references can then be used to reconstruct with the correct punctuation.

There's a good RegExp crib sheet for working out this kind of thing here...

https://www.cs.tut.fi/~jkorpela/perl/regexp.html

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by 2fer on Tue, 2015-07-14 21:21

Thank you, David. I am getting the first part to work, using:

Search: /\b([a-z]+)([A-F])\b/
Replace: $1. $2

but the second filter isn't changing anything at all. It is identical to the first filter except that for "Find", in place of [A-F] it uses [H-Z]. Maybe I need to try it with ([A-Z]^G) to do it in one step. No matter which version I get to work there are one or two words that will need plain text search/replace when it is done.

I am going to leave it alone for a bit and try again this evening. When I tried your suggested codes, the entire description disappeared for all products. It is better half fixed than the way it was, so I'll try again later. Thank you for the RegExp link, it helps.

Submitted by support on Wed, 2015-07-15 08:52

Hi,

Without creating a ridiculously complex regexp I think the best approach, if manageable would be your simplified version of the Search and Replace RegExp, and then afterwards in sequence the series of normal Search and Replace filters to convert the modified propitiatory terms back into the correct version!

Cheers,
David.
--
PriceTapestry.com

Submitted by 2fer on Wed, 2015-07-15 15:38

Thank you David. I had planned to do that rather than trying a series of RegExp filters. There are only 3 proprietary terms that would need to be re-attached. I had set it up and at first was happy with success, but after examining several products I could see that it was not doing what needed to be done.

I took a closer look at things and I see that the filter was not responsible for the sentence structure I had given it credit for. The filter isn't doing anything. If I search from the raw text datafeed, it shows the exact same punctuation as I'm getting with the filter. The first few sentences have normal punctuation, the part after the first few sentences it turns into run-on text. Commas tipped me off.

It may be a matter of the order of things because there was already a large number of filters in place to remove "(TM)" and "(R)" from Titles and Descriptions, replace "&" and Drop Record filters - about 30 filters before I added the RegExp for Descriptions. I will keep trying.

A basic question: I added this filter from the post referenced, to the 'includes/filters.php' file and uploaded the edited file. Does it make any difference what order it is at in that file (near the top vs. further down the page)? and am I supposed to do anything other than upload the file? I mean, do I need to re-register the feed after the filter is in place?

Thank you!
2fer

Submitted by support on Wed, 2015-07-15 17:47

Hi,

The position of the code in includes/filter.php doesn't matter at all, and no need to re-register the feed, just re-import after editing filters for the changes to be applied...

To check that the fiter is taking effect, without any delimiters just using plain text it will behave just like Search and Replace, so you cold add a test instance using:

Search: Foo
Replace: Bar

... where Foo is a word appearing in one of your descriptions, and making sure that is applied before expanding the search operator to a regular expression...

Cheers,
David.
--
PriceTapestry.com

Submitted by 2fer on Thu, 2015-07-16 05:00

Thanks David,
I knew it was trying to do something because previous efforts caused unwanted changes. It was just waiting for better instructions. Some patient reading up on RegExp showed me the error of my efforts. I do have it working now, using
Search: /\b([a-z]+)([A-F])([a-z]+)\b/
Replace: $1. $2$3

It was not doing anything because the "\b/" after "([A-Z])" sent it to find only patterns that end with an upper case letter. Works fine using GREP, but RegExp works differently. I found some good resources in addition to the link you shared that helped me see what I was doing wrong. I believe I have another datafeed that this can be applied to as well. Thank you again for great support!