Some users building sites in non-English languages are coming across an issue caused by the fact that some of their feeds are UTF-8 encoded whereas others are ISO-8859-1 encoded. Since you must choose a single character set for the entire site; a solution to this is to use PHP's utf8_encode() and utf8_decode() functions as required aginst the product name and description fields of feeds not in the same character encoding as your site. The following code will provide 2 additional filters that will process any selected field against these functions.
To implement this; choose the character encoding of the majority of your feeds, and set this as the character encoding for the entire site in config.php, for example:
$config_charset = "UTF-8";
Then, for any feeds that are ISO-8859-1 encoded, use the new "UTF8 Encode" filter, which can be installed by adding the following code to includes/filter.php
Note: These filters are now included as part of the distribution
<?php
/*************************************************/
/* UTF8 Encode */
/*************************************************/
$filter_names["utf8Encode"] = "UTF8 Encode";
function filter_utf8EncodeConfigure($filter_data)
{
print "<p>There are no additional configuration parameters for this filter.</p>";
}
function filter_utf8EncodeValidate($filter_data)
{
}
function filter_utf8EncodeExec($filter_data,$text)
{
return utf8_encode($text);
}
/*************************************************/
/* UTF8 Decode */
/*************************************************/
$filter_names["utf8Decode"] = "UTF8 Decode";
function filter_utf8DecodeConfigure($filter_data)
{
print "<p>There are no additional configuration parameters for this filter.</p>";
}
function filter_utf8DecodeValidate($filter_data)
{
}
function filter_utf8DecodeExec($filter_data,$text)
{
return utf8_decode($text);
}
?>
Alternatively, if the majority of your feeds are ISO-8859-1 encoded choose that as the main character set for the site, and use the UTF8 Decode filter against UTF-8 encoded feeds. These filters are now included in the distribution.
In my case I have UTF-8 and ISO-8859-1 feeds, I modify the file tapestry.php with this
function is_utf8($string) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
return preg_match('%^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$%xs', $string);
}
and in the tapestry_normalise function I add
if (!is_utf8($text)) $text = utf8_encode($text);