You are here:  » Special characters in XML


Special characters in XML

Submitted by olovl on Wed, 2006-02-08 11:33 in

Hello!

We purchased Price tapestry yesterday and I am very impressed.
Thank you for a good product, saved me alot of time not having to do it myself :)

Anyway, I have a small problem aswell.
When I import any xml or csv file the import function strips out all special characters, like "Å Ä Ö".
How can I get around that?

/Olov

Submitted by support on Wed, 2006-02-08 11:38

Hi,

This is being caused by the normalisation function. If you look in the file includes/tapestry.php you will see this function:

<?php
  
function tapestry_normalise($text,$allow "")
  {
    
$text str_replace("-"," ",$text);        
    
$text preg_replace('/[^A-Za-z0-9'.$allow.' ]/e','',$text);
    
$text preg_replace('/[ ]{2,}/',' ',$text);
    return 
$text;
  }
?>

What this does is remove any characters from a product name (or any other field that is processed with this function) so that it is safe to use within a URL. You can override this by removing the line that performs the invalid character removal - simply comment it out so that you have this:

<?php
  
function tapestry_normalise($text,$allow "")
  {
    
$text str_replace("-"," ",$text);        
    
// $text = preg_replace('/[^A-Za-z0-9'.$allow.' ]/e','',$text);
    
$text preg_replace('/[ ]{2,}/',' ',$text);
    return 
$text;
  }
?>

You should then find that the special characters work OK; however look out for any problems that may occur if you use the search engine friendly URL option using .htaccess. Let me know how you get on...!

Thanks,
David.

Submitted by olovl on Wed, 2006-02-08 11:58

thanks, that worked well.

however, I do use mod_rewrite.
But I guess I could modify the urls when they are created in the same way? I dont want "å ä ö" in the Url:s, but would prefer to replace them with "a a o" instead of removing them.

Submitted by support on Wed, 2006-02-08 12:01

Hi,

I will look into this. Is there a particular name for these special characters, so that I can research when and where they occur and write the code to replace the appropriate character ranges with non-accented equivalents?

Thanks,
David.

Submitted by olovl on Wed, 2006-02-08 12:07

This is how I take care of URL:s in a phpbb forum with mod_rewrite:

function make_url_friendly($url)
{
   $url = trim($url);
   $url = strtolower($url);
   // Fix for most recent topics block
   // or else a b is shown in every url
   $find = array('<b>',
      '</b>');
   $url = str_replace ($find, '', $url);
   $url = preg_replace('/<(\/{0,1})img(.*?)(\/{0,1})\>/', 'image', $url);
   $find = array(' ',
      '&quot;',
      'quot',
      '&',
      'amp;',
      '\r\n',
      '\n',
      '/',
      '\\',
      '+',
      '<',
      '>');
   $url = str_replace ($find, '-', $url);
   $find = array('é',
      'è',
      'ë',
      'ê',
      'É',
      'È',
      'Ë',
      'Ê');
   $url = str_replace ($find, 'e', $url);
   $find = array('í',
      'ì',
      'î',
      'ï',
      'Í',
      'Ì',
      'Î',
      'Ï');
   $url = str_replace ($find, 'i', $url);
   $find = array('ó',
      'ò',
      'ô',
      'Ó',
      'Ò',
      'Ô');
   $url = str_replace ($find, 'o', $url);
   $find = array('ö',
       'Ö');
   $url = str_replace ($find, 'o', $url);
   $find = array('á',
      'à',
      'â',
      'Á',
      'À',
      'å',
      'Å',
      'Â');
   $url = str_replace ($find, 'a', $url);
   $find = array('ä',
       'Ä');
   $url = str_replace ($find, 'a', $url);
   $find = array('ú',
      'ù',
      'û',
      'Ú',
      'Ù',
      'Û');
   $url = str_replace ($find, 'u', $url);
   $find = array('ü',
       'Ü');
   $url = str_replace ($find, 'ue', $url);
   $find = array('ß');
   $url = str_replace ($find, 'ss', $url);
   $find = array('ç');
   $url = str_replace ($find, 'c', $url);
   $find = array('/[^a-z0-9\-<>]/',
      '/[\-]+/',
      '/<[^>]*>/');
   $repl = array('',
      '-',
      '');
   $url = preg_replace ($find, $repl, $url);
   $url = str_replace ('--', '-', $url);
   return $url;
}

Not as afficient as reg exp, but it works :)