You are here:  » Strip HTML


Strip HTML

Submitted by chrisst1 on Wed, 2012-06-20 11:05 in

Hi David

Does PT strip all HTML tags across all fields by default? If no can we alter includes/admin.php to include all fields and also allow br b tags on description only.

Thanks

Chris

Submitted by support on Wed, 2012-06-20 11:25

Hi Chris,

There's no HTML stripping by default but if you want to strip all HTML from name, category and brand, and from description with the exception of br and b tags then sure - in includes/admin.php look for the following comment at line 216:

    /* apply user filters */

...and REPLACE with:

    $importRecord["name"] = strip_tags($importRecord["name"]);
    $importRecord["category"] = strip_tags($importRecord["category"]);
    $importRecord["brand"] = strip_tags($importRecord["brand"]);
    $importRecord["description"] = strip_tags($importRecord["description"],"<br><b>");
    /* apply user filters */

Bear in mind that this is applied prior to any user filters or category / product mapping incase any changes are required to any existing mappings in place...

Cheers,
David.
--
PriceTapestry.com

Submitted by chrisst1 on Wed, 2012-06-20 12:25

Thanks David

In our old PT version we were also using the following filters, am I safe to add these back as well?

{code saved}

Submitted by support on Wed, 2012-06-20 12:29

Hi Chris,

That will work fine, but in the latest version make sure that it is inserted prior to where the $importRecord array is constructed. Just before the following comment (line 146) would be the best place to insert:

    /* create array on which to apply filters etc. */

Cheers,
David.
--
PriceTapestry.com

Submitted by chrisst1 on Thu, 2012-06-21 10:34

Thanks David

They all worked fine, I am also looking to drop all records if they contain noimage.gif, nopic.gif etc from the image_url field. I've seen something close on the forum recently http://www.pricetapestry.com/node/3274 could I adapt that to work on the image url?

Chris

Submitted by support on Thu, 2012-06-21 10:42

Hi Chris,

That will work fine - use as the first 2 lines of the mod:

$stopWords = array("noimage","nopic"); // add to as required all lower case
$checkFields = array("image_url"); // list of fields to check for stop words in

Cheers,
David.
--
PriceTapestry.com

Submitted by chrisst1 on Thu, 2012-06-21 13:01

David

Getting there

Is it OK to add aditional fields to the following, i.e image_url if field is blank.

/* check product record for minimum required fields */
if (!$importRecord["name"] || !$importRecord["buy_url"] || !$importRecord["price"]) return;

Also can we drop records if the image url is incomplete i.e missing filename.

Chris

Submitted by support on Thu, 2012-06-21 13:15

Hi Chris,

Sure, that line can be extended to any other fields as required; e.g.

if (!$importRecord["name"] || !$importRecord["buy_url"] || !$importRecord["price"] || !$importRecord["image_url"]) return;

To check for a valid filename, you could use a Drop Record If Not RegExp as a Global Filter on the Image URL field, and as the match text:

(gif|jpg|jpeg|png)

(plus any other extensions) - do bear in mind this may cause you to miss images from merchants that don't necessarily use common file extensions - although unusual it's not totally unheard of for image URLs so be something along the lines of

http://www.example.com/images/123345/

If you knew that to be the case for a certain merchant then instead of using Global Filters just use the Drop Record If Not RegExp as described above as a per-feed filter...

Cheers,
David.
--
PriceTapestry.com

Submitted by chrisst1 on Thu, 2012-06-21 13:51

Thanks David

Good point, i've come across those url's in the past.

I shall ponder on that one.

Chris

Submitted by chrisst1 on Thu, 2012-06-21 15:19

Hi David

Using this url as an example
http://www.example.com/admin/ProductImages/TRAM12NEW_600.jpg

The code below seem to only drop records if the stopwords contain "admin" (lowecase) but not "TRAM12NEW_600"

$stopWords = array("word1","word2"); // add to as required all lower case

$checkFields = array("image_url"); // list of fields to check for stop words in

foreach($checkFields as $checkField)
{
foreach($stopWords as $stopWord)
{
if (strpos(strtolower($importRecord[$checkField]),$stopWord) !== FALSE) return;
}
}

Any ideas?

Chris

Submitted by support on Thu, 2012-06-21 16:47

Hi Chris,

Ah - this wasnt clear in the other thread; since the code uses strtolower to make it case insensitive, that does mandate that the keywords in $stopWords must be all lowercase, e.g.

  $stopWords = array("tram12new_600");

The other thread was published before PHP's range of case insensitive string functions were introduced making this the most robust implementation compatible with all PHP versions...

Cheers,
David.
--
PriceTapestry.com