You are here:  » Non-UK Characterset


Non-UK Characterset

Submitted by madstock on Wed, 2006-03-01 10:13 in

I have knocked together a French site that incorporates the TD Kelkoo feeds at http://www.france.madstock.com , however any "special characters" (i.e. accented letters) are ignored in any database data - is there any way to change this please?

(Incidentally thanks for the CRON help, worked a treat)

Submitted by support on Wed, 2006-03-01 11:16

Hiya,

The reason for this is the quite aggressive normalisation that goes on in includes/tapestry.php. Look for the following line:

$text = preg_replace('/[^A-Za-z0-9'.$allow.' ]/e','',$text);

If you comment the above line out the characters will be preserved; however you will need to look out for any problems with the URL if you are using mod_rewrite, as some characters may not be URL safe...

Submitted by madstock on Wed, 2006-03-01 11:24

Thanks - this allowed the characters to be presented, but now they appear to be somewhat screwy (even in descriptions), in that for example:

Capacité displays as Capacité

Would I perhaps have to re-register the feeds?

Submitted by support on Wed, 2006-03-01 11:33

If you play around with the Character Encoding options in your browser (View menu in Firefox, I think it's the same in IE); can you fix it by selecting an alternative character set?

That would indicate that something needs changing in the headers or HTML header in order to force the correct character set...

Submitted by iman on Wed, 2006-03-01 11:56

David is right.

You seem to have declared non UTF charset in your headers with UTF8 content.

That's where IE gets confused.

Either change the web page charset, or unencode UTF8 with utf8_decode() function.

Happy coding, I.

ps: by the way, does Kelkoo have an affiliate programm, or are you just testing?

Submitted by support on Wed, 2006-03-01 12:39

It might be worth adding the following change to includes/header.php:

immediately following:

print "<head>";

add:

print "<meta http-equiv='Content-Type' content='text/html;charset=utf-8' >";

Let us know if that works. It is possible for your server to be forcing a character set within the HTTP headers, so that's where we need to look next if this doesn't help...

Submitted by madstock on Wed, 2006-03-01 14:27

Okey doke - I changed the type to Unicode, and (removing the character normalisation and reimporting a test feed with lots of accents grave etc.), data from the feed displayed marvellously.

Unfortunately everything else went belly up, with any other special characters (in the non-dynamic portions of pages such as the search form, footer, nav bar, products.php etc.) becoming distorted terribly.

I have tried several character sets and doc types to no avail, and don't really know what to try next!

As for kelkoo france, yes, they have an affiliate prog. through Tradedoubler, which is ideal for the purposes of this site... (although would not be too hot for a price comparison site, as it links to err a price comparison site).

'Tis all very odd.

Submitted by support on Wed, 2006-03-01 18:48

Hi madstock;

Could you email me or post the URL to the feed you are trying to use and i'll have a play around on the dev server and see what I can come up with. If you post the filename i'll edit it out as soon as i've downloaded it...

Cheers,
David.

Submitted by madstock on Wed, 2006-03-01 18:51

Sure, one of the feeds is at:
[snip]

Thanks for all of your help!

Submitted by support on Wed, 2006-03-01 18:55

Got it - thanks!

I'll do some experiments and see how best to deal with extended characters. This does need sorting as there will be more "international" users once i've got the simple translation engine in place.

I've never worked on non English language sites before - what's the concensus with regards creating a search engine friendly URL for a product name that contains characters outside the 0-127 (extended ASCII) range?

Submitted by madstock on Wed, 2006-03-01 19:08

I don't know personally, as I don't utilise the mod rewrite - however google france appears to be fairly okay with it:

http://www.google.fr/search?hl=fr&q=allinurl%3A%2F%C3%A7

http://www.google.fr/search?hl=fr&q=allinurl%3A%2F%C3%A9

http://www.google.fr/search?hl=fr&q=allinurl%3A%2F%C3%80

etc..

Submitted by support on Wed, 2006-03-01 19:11

It's a shame that browsers convert them back into entity values as soon as they get the chance :(

I'll have a play and see how best to handle them...