You are here:  » Removing Invalid Characters from feeds?


Removing Invalid Characters from feeds?

Submitted by scorpionwsm on Tue, 2007-07-03 20:50 in

Hi David,

All set up on dedicated server now and uploading products to www.comparecatalogues.co.uk I have imported a few xml feeds but like some of the csv feeds there is usually some messed up code such as & is there anyway that these can be replaced with the right words.

Most of the feeds contain the same errors, so

&

...will be replaced by and or &

Regards

Mark

Submitted by support on Tue, 2007-07-03 21:03

Hi Mark,

The script aggressively strips "problem" characters such as & and some punctuation characters in order to maintain clean URLs. If you are using feeds that contain HTML entities, the best solution is to modify the import record handler to allow the & and ; characters through.

In includes/admin.php, these characters are stripped from the product name by the following code on line 157:

$record[$admin_importFeed["field_name"]] = tapestry_normalise($record[$admin_importFeed["field_name"]]);

Change this as follows, adding the optional parameter to the tapestry_normalise() function that specified additional characters to permit:
$record[$admin_importFeed["field_name"]] = tapestry_normalise($record[$admin_importFeed["field_name"]],"&;");

Similarly, for the description, change the following code on line 163:

$record[$admin_importFeed["field_description"]] = tapestry_normalise($record[$admin_importFeed["field_description"]],",'\'\.%!");

to...
$record[$admin_importFeed["field_description"]] = tapestry_normalise($record[$admin_importFeed["field_description"]],",'\'\.%!&;");

Strictly speaking these entities should not be in the feeds - there is no stipulation that content from a feed is ultimately going to end up being served as HTML. These modifications will allow HTML entities through...

Cheers,
David.

Submitted by scorpionwsm on Fri, 2007-07-13 00:14

Hi David,

Don't worry about the last one, I found it in here, knew it was somewhere.

For people that are finding that when they import a product feed, and the product name contains a £ sign, the product displays something other than the pound sign it will goto a product not found page.

I'm sure David will know of a better way of curing this, so that it does show a £ sign in the name but

in INCLUDES/ADMIN.PHP

Find lines similar to this one

$record[$admin_importFeed

then add this

$record[$admin_importFeed["field_name"]] = str_replace("£","",$record[$admin_importFeed["field_name"]]);

Add that line to the text which is similar, so it strips the £ out of the product name

then your products will be found.