You are here:  » Importing large feeds last.

Support Forum



Importing large feeds last.

Submitted by AD_Mega on Fri, 2008-04-04 22:45 in

I have a cronjob to import the modified feeds each night. My problem is that the largest feed( over 700,000 product) is always the second feed to be imported. It takes 5 or more hours to import. During the import my site often can not be reached. Sometime the server is rebooted which kills the script and the feeds aren't imported. Is there a way that I can make the larger feeds import last? I have two other large feeds one 500,000 and the other 300,000.

Submitted by support on Sat, 2008-04-05 10:53

Hi Adrian,

I can certainly look at a programmatic way to do this, but first of all; is it an option for you to arrange your cron job along the following lines;

1) Download small feeds
2) import.php @MODIFIED
3) Download large feeds
4) import.php @MODIFIED

Cheers,
David.

Submitted by jonny5 on Sat, 2008-04-05 11:09

do they all need downloading every night??

or what you could do is artifically break the feeds up into smaller feeds.

what you can do on most affiliate networks if you have a large feed is to create a feed for each or a group of the categories in the feed , so instead of one large feed you can have 2 or however many categories there are , and you may then find that there are certain categories you are not even interested in so you can miss them out.

Submitted by AD_Mega on Sat, 2008-04-05 23:05

I thing I do that with the feed thats 700,000 but the others I think they come in a zipped file from CJ. so I have a cronjob to download and unzip the file.

Submitted by support on Sun, 2008-04-06 10:35

Hi Adrian,

I understand, in that case a sort by file size would be required - and I've just thought of a very quick and easy modification to do that; assuming that the number of products is roughly analogous to the file size.

In scripts/import.php, look for the following code on line 75:

    $sql = "SELECT * FROM `".$config_databaseTablePrefix."feeds`";

(this is immediately below the line that checks for the @MODIFIED option)

Try simply changing this as follows; and it will ensure that the smallest (or rather least number of products) feeds are imported first...

    $sql = "SELECT * FROM `".$config_databaseTablePrefix."feeds` ORDER BY products";

Cheers,
David.

Submitted by AD_Mega on Mon, 2008-04-07 04:29

I had something a little different in my import.php file. I added the "ORDER BY products" to the if statements.

if ($filename == '@ALL')
  {
    if (database_querySelect("SELECT filename, products FROM `".$config_databaseTablePrefix."feeds` ORDER BY products",$rows))
    {
      foreach($rows as $feed)
      {
        if (file_exists('../feeds/'.$feed['filename']))
        {
          import();
        }
      }
    }
  }
  elseif($filename == '@MODIFIED')
  {
    if (database_querySelect($sql = "SELECT filename, products, imported FROM `".$config_databaseTablePrefix."feeds` ORDER BY products",$rows))
    {
      foreach($rows as $feed)
      {
        if (file_exists('../feeds/'.$feed['filename']) && ($feed['imported'] < filemtime('../feeds/'.$feed['filename'])))
        {
          import();
        }
      }
    }
  }
  else
  {
    if (database_querySelect("SELECT filename, products FROM `".$config_databaseTablePrefix."feeds` WHERE filename='".database_safe($filename)."'",$rows))
    {
      $feed = $rows[0];
      import();
    }
  }

Submitted by support on Mon, 2008-04-07 08:04

Hi Adrian,

That should work fine also - has it done the trick?

Cheers,
David.

Submitted by AD_Mega on Mon, 2008-04-07 10:43

Yes its importing the small feeds first. Thanks.

Submitted by AD_Mega on Tue, 2008-05-27 21:59

I have a cronjob to imports twice a day. I do this because feeds are ftp to my server at different times during the day. I get the larger feeds later at night. Sometimes all the feeds aren't updated by time the next cronjob to import feed starts. This cause a high cpu usage due to having two imports running at the same time. How can I prevent another import if one is already in progress?

Submitted by support on Wed, 2008-05-28 08:23

Hi Adrian,

Here's what I'd do... First, create a new folder in your scripts directory, called "semaphore". Make this folder writable by everyone so that PHP can create a temporary file in this directory to indicate that an import is taking place. You can normally make a folder writable by all/world/everybody using your FTP program - right-click on the new folder name, and then look for "Permissions..." or perhaps "Properties" then "Permissions...".

I'm also assuming that your import shell script is CD'ing to the scripts directory before running import.php - if not it would be worth modifying your shell script so that this is the case.

Now to create the semaphore; add the following code at the very top of scripts/import.php (just after the opening PHP tag)

  $semaphore = "semaphore/importing.txt";
  // abort if the semaphore file exists - already importing!
  if (file_exists($semaphore)) exit();
  // create the semaphore file to indicate that we are now importing
  touch($semaphore);

...and then at the very end of the script (just before the closing PHP tag), add:

  // delete the semaphore to indicate that we have finished
  unlink($semaphore);

Hope this helps!

Cheers,
David.

Submitted by AD_Mega on Fri, 2008-05-30 18:45

I believe the program is working. I know it has prevented another import from starting. I just have to wait to see in that import.txt file is delete after the import is finished. My import.php had been running for over 24hrs. I believe its because it never finished imoporting before the new updated feeds are ftp to my site. I had a sitemap generator scanning my site so maybe that slowed things down. I'll check tomorrow to see if the import script finished before the next cronjob.

Just checked the program doesn't delete the import.txt file once it is finish importing.

Submitted by AD_Mega on Mon, 2008-06-02 17:49

I think I found the problem. Shouldn't unlink($semaphore); be before exit(); in the import.php file?

Submitted by support on Mon, 2008-06-02 17:51

Ooops - sorry Adrian - yes that would make sense!

Cheers,
David.

Submitted by AD_Mega on Tue, 2008-06-03 07:47

Now I'm getting this Warning: unlink(semaphore/importing.txt): No such file or directory in /var/www/html/scripts/import.php on line 97

Submitted by support on Tue, 2008-06-03 08:12

Hi Adrian,

That indicates that PHP does not have write access to the semaphore/ directory, meaning that the touch() command is not able to create the file.

Try checking the permissions using your FTP program - right-click on the directory, and then look for Permissions... or perhaps Properties... and then permissions. The easiest thing to do is to give all write access (User/Group/World)...

That should be all it is...

Cheers,
David.

Submitted by AD_Mega on Wed, 2008-06-04 15:12

The file is being created in the semaphore folder but for some reason its not deleting it when it finish importing. I have given the folder all write access. I think the problem is that its looking for the file in the feeds directory.

Submitted by support on Wed, 2008-06-04 16:19

Hi Adrian,

It should be looking in the scripts/ directory, as PHP in most configurations internally "changes directory" to the script directory. Also, since you say the file is being created, it should certainly try and delete the file from the same place.

Could you perhaps try the following script, in the scripts/ directory:

test.php:

<?php
  $semaphore 
"semaphore/test.txt";
  
// abort if the semaphore file exists - already importing!
  
if (file_exists($semaphore)) exit();
  
// create the semaphore file to indicate that we are now importing
  
touch($semaphore);
  
// delete the semaphore to indicate that we have finished
  
unlink($semaphore);
?>

This should create and then delete a file, test.txt, so could you see if this also leaves test.txt in place without deleting it... that will give us an easier test bed on which to work out where the problem lies...

Cheers,
David.

Submitted by AD_Mega on Wed, 2008-06-04 17:29

Ok I put test.php in the scripts directory and ran the program. I guess the program was able to delete test.txt I don't see a file in the scripts or the semaphore directory.

Submitted by support on Wed, 2008-06-04 17:33

Hi Adrian,

Could you email me your modified scripts/import.php and i'll take a look...

Cheers,
David.