Hi,
I'm having problems with character encoding.
I'm trying to import an RSS feed (http://onedayonly.nl/rss) but prices in descriptions are broken when imported.
The XML is set as UTF-8: xml version="1.0" encoding="UTF-8"
My PT install is set as UTF-8: $config_charset = "utf-8";
My Mysql db is set as UTF-8: utf8_general_ci (field description; type = text)
The price in this case should be € 52,95 but in my database I find € 5295
What's going on?
The feed looks fine when browsing the RSS directly. Check http://onedayonly.nl/rss
When setting the characterset to UTF-8 in Safari, still everything looks as it should (normal looking euro sign).
Adding UTF8 Encoding filter results in € 5995 so it only got worse.
What to do?
Hi,
I checked the feed and I'm slightly confused as they appear to encoded as HTML entities; so one would expect to see just "euro" (as the & and ; are stripped at import). They're easy to permit. In includes/admin.php, look for the following section of code beginning at line 161:
if ($admin_importFeed["field_description"])
{
$record[$admin_importFeed["field_description"]] = strip_tags($record[$admin_importFeed["field_description"]]);
$record[$admin_importFeed["field_description"]] = tapestry_normalise($record[$admin_importFeed["field_description"]],",'\'\.%!");
}
...and either comment our delete that entire block to import the description "as is"; or alternatively REPLACE with the following code to permit the & and ; characters:
if ($admin_importFeed["field_description"])
{
$record[$admin_importFeed["field_description"]] = strip_tags($record[$admin_importFeed["field_description"]]);
$record[$admin_importFeed["field_description"]] = tapestry_normalise($record[$admin_importFeed["field_description"]],",'\'\.%!&;");
}
Hope this helps;
Cheers,
David.
Hi ruudkok,
I noticed your earlier post immediately before I posted the above reply. Let me know if you're still having any problems with this feed...
Cheers,
David.
Hi,
Does the euro sign appear correctly if you browse to the RSS feed directly? It sounds like it is an encoding error; but that would normally stop the parser.
If it looks OK when browsing the RSS feed directly, check what character set the browser is using; as it may have auto-detected ISO-8859-1 and ignored the utf-8 declaration in the RSS header. If that's the case; and your site is still required to be in utf-8, if you add a UTF8 Encode filter to the description field for this feed (and then re-import) that should fix it...
Cheers,
David.