You are here:  » Feed prematurely ending due to character

Support Forum



Feed prematurely ending due to character

Submitted by frychiko on Sat, 2008-07-12 03:56 in

Hi,
I have a feed from OnSale.com and only a portion of the feed is being imported.
I managed to track down to where the importing ends:

<programname>OnSale</programname>
<programurl>http://www.onsale.com</programurl>
<catalogname>tomtom - 7495046</catalogname>
<lastupdated>07/10/2008 02:51:51 AM</lastupdated>
<name>CAT57500</name>
<keywords>GEFEN A938 GEFEN</keywords>
<description>CAT57500</description>

There's a strange character in between CAT5 and 7500 which is terminating the import.
I'm not sure what ascii value it is, but it shows up as a small fat dot, here it's not showing up properly.

How can I get around this?

cheers,
Peter

Submitted by support on Sat, 2008-07-12 05:48

Hello Peter,

Are you working with any feeds that require special characters, i.e. accented characters? The problem here is that a spurious character in the feed does not conform to the character set of the file, and so PHP's XML parser abandons the parse.

If you don't need any special characters, a solution that has worked before is to strip out any multi-byte characters from the data before handing it to the parser; but this is only suitable if none of your feeds contain extended characters. Let me know if this is the case and I'll email you the fix, otherwise we'll look into alternative solutions...

Cheers,
David.

Submitted by frychiko on Sat, 2008-07-12 06:03

Hi David,

I don't think any of my files contain extended characters so I'll go for your fix. Do you have my email? I don't really want to post it out in the open.

cheers,
Peter

Submitted by support on Sat, 2008-07-12 06:04

Hi Peter,

I'll send you the fix - I do have your email address (as you used to register on the forum)

Cheers,
David.

Submitted by frychiko on Sun, 2008-10-12 14:12

Hi David,

I've finally finished my site and uploaded it to my server but it looks like the modified function MagicParser_parse_xml() does not work on the server (imports seem to stop adruptly and I tracked it down to the xml_parse() function), but it all works fine on my local development machine.

Is there any reason you know of why it would work locally and not on the other server??

cheers,
Peter

Submitted by support on Mon, 2008-10-13 08:05

Hello Peter,

The first thing I would do is to remove the fix as if your server is running a later version of PHP with different libraries it shouldn't be a problem - so if you replace the original MagicParser.php and have another go; if that still causes the problem we'll look at it again...

Cheers,
David.

Submitted by frychiko on Mon, 2008-10-13 09:46

Hi David,

Prior to that though, what I did was talk to server support and it looked like the script was timing out at around 1 minute (even with explicitly setting php time limit(0) in the import function).

So, I set the limit to 0 in my .htaccess file and now it seems to run, but it runs infinitely though. It's been running one hour plus, but checking the database I don't see any records being inserted, so perhaps it's hung somehwere...

For reference, On my localhost LAMP setup, to import this particular XML feed takes 9 minutes.

Anyway I will try what you suggested now..

cheers,
Peter

Submitted by support on Mon, 2008-10-13 09:50

Hi Peter,

Thanks - let me know how you get on. The problem is something i'm aware of; to do with a very specific version of the underlying XML libraries used by PHP (so you can't even nail it down to a particular PHP version) whereby the parser does not correctly identify the character set of the data...

Cheers,
David.

Submitted by frychiko on Mon, 2008-10-13 13:07

Just tested a little more before trying the original xml_parse and with the modified xml_parse, it seems like 50% of the feeds import and the rest hang. (5 out of 11 feeds)

Ok, now I have plugged in the original xml_parse and am testing on my local LAMP currently and I am getting error messages with some feeds:

Mind you, I have the magic parser functions converted into classes.

A PHP Error was encountered
Severity: Notice
Message: Undefined property: Magic_parser::$xml_current_record
Filename: libraries/magic_parser.php
Line Number: 94

And then..

A Database Error Occurred
Error Number: 1267
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
SELECT * FROM (`category_temp`) WHERE `name` = 'Networking : Adapters & Accessories : Adapters – Wired Desktop : Adapters- Fast Ethernet' AND `merchant_id` = '22'

Judging from that sql statement it looks like funny/non-standard characters have something to do with it??

Some feeds import fine though..

Submitted by support on Mon, 2008-10-13 13:32

Hi,

As this was quite a while ago now, can you send me the modified parser that I sent to you at the time so that I know the code you are using; and I will have another look at it...

This looks like another problem now, having allowed illegal characters through the parser, it is causing a problem in the database; but I think I know a solution for both...

Cheers,
David.

Submitted by frychiko on Mon, 2008-10-13 13:37

Ok, David I sent the modified file to your email address.

Peter