You are here:  » Character encoding issue while importing UTF-8 RSS

Support Forum



Character encoding issue while importing UTF-8 RSS

Submitted by ruudkok on Fri, 2009-10-23 10:56 in

Hi,

I'm having problems with character encoding.
I'm trying to import an RSS feed (http://onedayonly.nl/rss) but prices in descriptions are broken when imported.

The XML is set as UTF-8: xml version="1.0" encoding="UTF-8"
My PT install is set as UTF-8: $config_charset = "utf-8";
My Mysql db is set as UTF-8: utf8_general_ci (field description; type = text)

The price in this case should be € 52,95 but in my database I find € 5295

What's going on?

Submitted by support on Fri, 2009-10-23 11:15

Hi,

Does the euro sign appear correctly if you browse to the RSS feed directly? It sounds like it is an encoding error; but that would normally stop the parser.

If it looks OK when browsing the RSS feed directly, check what character set the browser is using; as it may have auto-detected ISO-8859-1 and ignored the utf-8 declaration in the RSS header. If that's the case; and your site is still required to be in utf-8, if you add a UTF8 Encode filter to the description field for this feed (and then re-import) that should fix it...

Cheers,
David.

Submitted by ruudkok on Fri, 2009-10-23 15:21

The feed looks fine when browsing the RSS directly. Check http://onedayonly.nl/rss
When setting the characterset to UTF-8 in Safari, still everything looks as it should (normal looking euro sign).

Adding UTF8 Encoding filter results in € 5995 so it only got worse.

What to do?

Submitted by support on Fri, 2009-10-23 15:30

Hi,

I checked the feed and I'm slightly confused as they appear to encoded as HTML entities; so one would expect to see just "euro" (as the & and ; are stripped at import). They're easy to permit. In includes/admin.php, look for the following section of code beginning at line 161:

    if ($admin_importFeed["field_description"])
    {
      $record[$admin_importFeed["field_description"]] = strip_tags($record[$admin_importFeed["field_description"]]);
      $record[$admin_importFeed["field_description"]] = tapestry_normalise($record[$admin_importFeed["field_description"]],",'\'\.%!");
    }

...and either comment our delete that entire block to import the description "as is"; or alternatively REPLACE with the following code to permit the & and ; characters:

    if ($admin_importFeed["field_description"])
    {
      $record[$admin_importFeed["field_description"]] = strip_tags($record[$admin_importFeed["field_description"]]);
      $record[$admin_importFeed["field_description"]] = tapestry_normalise($record[$admin_importFeed["field_description"]],",'\'\.%!&;");
    }

Hope this helps;

Cheers,
David.

Submitted by support on Sat, 2009-10-24 16:35

Hi ruudkok,

I noticed your earlier post immediately before I posted the above reply. Let me know if you're still having any problems with this feed...

Cheers,
David.