You are here:  » Manage feed with grep from command line


Manage feed with grep from command line

Submitted by Retro135 on Thu, 2017-05-04 19:14 in

I have one huge feed (675mb+) that now takes 14-15 hours to import (as I map products to more than one category). Am trying to reduce it before I UL it to PT.

The command line I use to delete records:
grep -vwE "(word1|word2|word3)" /Directory/feed.txt > /Directory/feed1.txt works a charm.

What I can't figure out:
1. I want to delete duplicates in product name field and keep one, as PT does. How can I do this from command line? This alone would reduce the feed by at least 75% (most are multiple sizes, don't need).

2. How can I delete records using multiple strings such as "gift certificate|Gift Certificates|another string"?

Thank you for any help with this!

Submitted by support on Fri, 2017-05-05 06:40

Hi,

I'm afraid there wouldn't be any straight forward method on the command line of remembering existing names and dropping that way (as it would require parsing, so that may as well be left to Price Tapestry) but it would be no problem to use strings within the list of keywords to match, the following should work fine;

grep -vwEi "(word1|word2|gift certificate)" /Directory/feed.txt > /Directory/feed1.txt

(note the addition of the "i" flag to make the matching case insensitive)

Cheers,
David.
--
PriceTapestry.com