Hi,
I was wondering what the best way to import a large datafeed file is (195MB). I currently do not have SSH access to my hosted account, though I may be able to get this later. I've made the changes in my htaccess file to increase the memory on data imports and php excecution time. However I'm still getting the following error:
Also if there is any other way to do the automated import without SSH access, that would be much appreciated. I can request to have receive my feed as an HTTP post or FTP. I can have the format of the feed be in either XML, zipped CSV, or a zipped tabbed or piped file.
Thanks!
Paul
Hello Paul,
As XML is a stream format (continuous data without newlines) it is not currently possible to import it in chunks. For CSV files however (or tabbed / piped) it is possible and I have a slow import tool to do just that. It imports in chunks of 100 products and then waits for 2 seconds before reloading the page (using an HTTP meta refresh) to import the next 100. Click here, extract and upload to your /admin/ folder.
If you then browse to /admin/feeds_import_slow.php you will see a list of all registered non-XML feeds, and you can then click to import as normal. As you mentioned that CJ can supply your feeds in other formats, I would recommend a piped file as this tends to be more robust than comma separation.
Regarding automation, as I understand it CJ can FTP the files to you on a regular basis which is the best option to choose (rather than HTTP POST). To be able to perform the import automatically would depend on how your CRON process works on your hosting account. In most cases where you don't have SSH access, you are normally only able to schedule a script to run via HTTP, in which case you could try scheduling feeds_import_slow.php with the filename parameter included, for example:
http://www.example.com/admin/feeds_import_slow.php?filename=Merchant.csv
(where Merchant.csv is the filename of the pipe delimited feed that is FTP'd to your server by CJ)
Hope this helps!
Cheers,
David.
Hi David,
Thanks again for the quick help! My huge datafeed is now running, thankfully. As far as the automation, I now have a cron job running that looks something like this:
{link saved}
It is set up to run once a day. Assuming that I have the datafeed being importing by CJ to the /feeds directory, does this look like I'm all set?
Thx,
Paul
Whoops I forgot to mention that the file that CJ sends me is a zipped file so it looks like this:
Zappos_com-Product_Catalog_1.txt.gz
Can feeds_import_slow.php read this format, or does it have to be uncompressed first? If so, do you know of a way I can automate the unzipping process, so that the cron job will work?
Thx again, I really appreciate the help.
Paul
Hi Paul,
I think you mentioned that CJ could supply an uncompressed feed so that would be the best option. Let me know if that's not possible and I'll have a think about ways to uncompress without SSH...
Cheers,
David.
Hi David,
Unfortunately they can only supply a file that is uncompressed. I thought they could ftp a non-compressed file, but they can not.
Paul
Hi Paul,
In that case, let's look at options using PHP's gzopen() function.
Taking it a step at a time, the first requirement is that your /feeds/ directory is writeable by PHP. The easiest way to do this is normally via your FTP program. In the remote window, right-click on the feeds folder of your installation, and look for "Permissions..." or maybe "Properties..." and then Permissions. Then, give write access to all users - Owner / Group / World.
Let me know if that's possible, and i'll look into a gzopen() script...
Cheers,
David.
Hi David,
Yep, it's possible. The feeds directory is writeable by PHP. I was able to give Permission for users to write/read to that directory.
Paul
Thanks Paul,
I'll look at a script using gzopen() first thing tomorrow...
Cheers,
David.
Hi Paul,
OK, 2 methods to try. The first will attempt to use the gzip binary on your server, and if this works it would be preferable to the PHP method. In both cases, the filename can be supplied in the filename= parameter exactly as above and is expected to be found in the ../feeds/ folder, so this script can be run from /admin/ just as feeds_import_slow.php.
Using gzip:
gunzip1.php
<?php
$filename = "../feeds/".$_GET["filename"];
$command = "/usr/bin/gzip -c -d -S \"\" ".$filename." > ".$filename.".ungzipped";
exec($command);
if (filesize($filename.".ungzipped"))
{
unlink($filename);
rename($filename.".ungzipped",$filename);
print "OK!";
}
else
{
unlink($filename.".ungzipped");
print "FAILED";
}
?>
Using PHP:
gunzip2.php
<?php
$filename = "../feeds/".$_GET["filename"];
$fp = gzopen($filename,"r");
if (!$fp)
{
print "FAILED";
exit();
}
$fpo = fopen($filename.".unzipped","w");
while(!gzeof($fp))
{
fwrite($fpo,gzread($fp,1024));
}
gzclose($fp);
fclose($fpo);
unlink($filename);
rename($filename.".unzipped",$filename);
print "OK!";
?>
...for example, if uploaded to /admin/ (with the .gz file in /feeds/), browse to:
http://www.example.com/admin/gunzip1.php?filename=Zappos_com-Product_Catalog_1.txt.gz
Hope this helps!
Cheers,
David.
Hello David,
I test script feeds_import_slow.php, and I have a problem.
The script empties the table, but he does not add the data.
You think that it is a restriction of the server ?
It seems that I have the same problem with /scripts/import.php.
You have an idea ?
Thank you for your suggestion.
Mick
Hello Mick,
Is a normal import (click Import on /admin/ homepage) working as normal?
Cheers,
David.
Hello David,
It seems that it is my file csv which poses a problem.
I succeeded in obtaining a correct operation with another file csv.
Thanks, Mick
Thanks again man. I'm going to try and implement your recommendations tonight!
Paul
Sorry forgot to add the error:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, webmaster@bagbloom.heelcandyads.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
Paul