You are here:  » John lewis XML Feed causing strange issue


John lewis XML Feed causing strange issue

Submitted by sdhunter on Thu, 2010-11-25 00:18 in

Hi,

When importing the John Lewis feed, PT seems to be turning ' to 39 in the database. Also " is being turned into 375quot.
This means when I have a description or title the text is inserted and looks bad.
Is this a problem when importing due to the wrong character set being used or is the import stripping characters incorrectly?

Thanks

Steve

Submitted by support on Thu, 2010-11-25 09:21

Hi Steve,

What's happening here is that the feed contains data that is already HTML entity encoded, so the ' is represented as & # 3 9 ;. When the product name is normalised; the &, # and ; characters are removed leaving the 39 that you are seeing.

The solution would be a new filter, HTML Entity Decode which will convert these entries before inserting into the database. To do this, add the following new filter code to your includes/filter.php (just before the closing PHP tag):

  /*************************************************/
  /* HTML Entity Decode */
  /*************************************************/
  $filter_names["htmlEntityDecode"] = "HTML Entity Decode";
  function filter_htmlEntityDecodeConfigure($filter_data)
  {
    print "<p>There are no additional configuration parameters for this filter.</p>";
  }
  function filter_htmlEntityDecodeValidate($filter_data)
  {
  }
  function filter_htmlEntityDecodeExec($filter_data,$text)
  {
    return html_entity_decode($text);
  }

With the above code in place, add a new HTML Entity Decode filter the product name and description fields for the feed and that should solve the problem...

Cheers,
David.
--
PriceTapestry.com

Submitted by sdhunter on Sat, 2010-11-27 16:07

David,

I have added the filter and applied it to the name and description fields in my feed but it still happens after i reimport.

Submitted by sdhunter on Sat, 2010-11-27 16:10

Just searched in the feed and & # 3 9 ; is in as & a m p ; # 3 9 ;

' Who knows what you'll find with this '

Submitted by support on Sat, 2010-11-27 18:05

Hi Steve,

Try using Search and Replace instead of the HTML Entity Decode - search for & # 3 9 ; and replace with apostrophe - that should do the trick...

Assuming that the field in the XML is not enclosed within CDATA tags it is correctly encoded; however if the above doesn't work first time, use the actual data from the feed that you identified, i.e. search for & a m p ; # 3 9 ;

Cheers,
David.
--
PriceTapestry.com

Submitted by sdhunter on Sun, 2010-11-28 21:08

done that and its working. cheers david.