You are here:  » Some feeds importing a lot slower than the others...


Some feeds importing a lot slower than the others...

Submitted by paul30 on Fri, 2009-01-09 06:02 in

Hello, David, I have 5 feeds from 5 different merchants (xml) all of them come from commission junction, and the import speed is seriously different between feeds.

3 of the feeds are importing with a rate of around 10.000 products a minute while 2 others around 12.000 in 30 minutes. - Do you have any clues as to why such behavior can happen? Or what should I check on my end?

Thanks a lot!
Paul

Submitted by support on Fri, 2009-01-09 09:23

Hi Paul,

If the slower feeds contain lots of pre-amble records this can cause the parser to slow down whilst it finds the actual records you're intersted in (although once it does it should parse at the same speed as the quicker feeds).

This can happen if, for example, there are lots of CATEGORY records at the top of the feed before the PRODUCT records begin. What you could do so I can check this out; is if you could email me a link to your site and the filename of 2 feeds - 1 that imports quickly and 1 slowly; then I will download them from your /feeds/ folder to my test server and have a look...

Cheers,
David.

Submitted by Keeop on Thu, 2009-01-15 08:26

Hi David,

Not 100% related but I am having severe speed issues when importing feeds to the point where it is now taking almost 20 hours to import around 750k records. If I completely blat the products table, recreate it using the SQL file and then run a full import, this will all happily run in around 10 hours. However, it will then slow down for each subsequent import and it's not long before it's back to 24 hours again.

Do you know of any way I can speed this up? Any ways of optimizing the process and/or database? I can't blame the platform as it's a dedicated box with 2Gb RAM and dual 2Gb Xeon CPUs so this should be more than enough! It seems to be the size of the DB that's the problem as if I run imports on a load of other DBs with no more than around 80k records in each, it will do all these much, much quicker. For instance, a DB with a tenth of the records, around 80k, will complete in half an hour!

So, it seems to be the size of the database that's the problem. Would changing the product table type to a different format make any difference?

Cheers.
Keeop

Submitted by support on Mon, 2009-01-19 08:40

Hello Keeop,

It is almost certainly the DELETE action that takes the most time when importing a single feed within a very large database. It is normally always best to import all feeds at the same time, as in this case the script TRUNCATES the products table. Would you be able to try that as a comparison...?

Cheers,
David.

Submitted by Keeop on Thu, 2009-01-22 08:34

Hi David,

Spot on! Using 'truncate' the import process took around 5 hours, so much better thanks. Just now need to think of a way to keep the site 'live' while running these imports. Maybe a second products table to switch to using before running the import and back to the original after? Any thoughts or ideas would be appreciated!

Cheers.
Keeop

Submitted by support on Thu, 2009-01-22 10:31

Hello Keeop,

Yes - I have just tried a method to do this on my test server and it works fine! It creates a temporary table, imports to that table, and when it has all finished just DROPs the original products table and RENAMEs the temporary table back to products.

It is quite a few changes, can you email to me your includes/admin.php and scripts/import.php (I assume you are using import.php script to do this) and I will make the changes for you.

Cheers,
David.

Submitted by Keeop on Thu, 2009-01-22 13:13

Hi David,

Email sent, thanks. What was the performance hit out of interest? The other option is to just redirect to a 'Site Unavailable' page while updating but I'm not sure how to do this and I'd rather keep the site accessible.

Cheers.
Keeop

Submitted by support on Thu, 2009-01-22 13:21

Thanks - followed up by email.

Cheers,
David.

Submitted by chrisst1 on Thu, 2009-01-22 14:58

Hi David

If possible can you post these modifications on the forum or make downloadable as we have exactly the same problems as the dbs have got bigger.

Chris

Submitted by support on Thu, 2009-01-22 15:16

Hi Chris,

Sure - make sure you make back-ups of the modified files in case of any problems.

includes/admin.php

Look for the following code on line 260:

    $sql = sprintf("INSERT INTO `".$config_databaseTablePrefix."products` SET

...and REPLACE this with:

    global $importALL;
    if ($importALL)
    {
      $table = "products_import";
    }
    else
    {
      $table = "products";
    }
    $sql = sprintf("INSERT INTO `".$config_databaseTablePrefix.$table."` SET

Look for the following code on line 416:

    $sql = "DELETE FROM `".$config_databaseTablePrefix."products` WHERE merchant='".database_safe($admin_importFeed["merchant"])."'";
    database_queryModify($sql,$insertId);

...and REPLACE this with:

    global $importALL;
    if ($importALL)
    {
      $table = "products_import";
    }
    else
    {
      $table = "products";
      $sql = "DELETE FROM `".$config_databaseTablePrefix.$table."` WHERE merchant='".database_safe($admin_importFeed["merchant"])."'";
      database_queryModify($sql,$insertId);
    }

...and finally look for the following code on line 424

    $sql = "SELECT COUNT(*) AS productCount FROM `".$config_databaseTablePrefix."products` WHERE merchant='".database_safe($admin_importFeed["merchant"])."'";

...and REPLACE this with:

    $sql = "SELECT COUNT(*) AS productCount FROM `".$config_databaseTablePrefix.$table."` WHERE merchant='".database_safe($admin_importFeed["merchant"])."'";

scripts/import.php

Look for the following code starting at line 53:

  if ($filename == "@ALL")
  {
    $sql = "TRUNCATE `".$config_databaseTablePrefix."products`";
    database_queryModify($sql,$result);
    $sql = "SELECT * FROM `".$config_databaseTablePrefix."feeds`";
    if (database_querySelect($sql,$rows))
    {
      foreach($rows as $feed)
      {
        if (file_exists("../feeds/".$feed["filename"]))
        {
          import();
        }
      }
    }
  }

...and REPLACE this entire block with the following code:

  if ($filename == "@ALL")
  {
    $importALL = TRUE;
    $sql = "CREATE TABLE `".$config_databaseTablePrefix."products_import` LIKE `".$config_databaseTablePrefix."products`";
    database_queryModify($sql,$result);
    $sql = "SELECT * FROM `".$config_databaseTablePrefix."feeds`";
    if (database_querySelect($sql,$rows))
    {
      foreach($rows as $feed)
      {
        if (file_exists("../feeds/".$feed["filename"]))
        {
          import();
        }
      }
    }
    $sql = "DROP TABLE `".$config_databaseTablePrefix."products`";
    database_queryModify($sql,$result);
    $sql = "RENAME TABLE `".$config_databaseTablePrefix."products_import` TO `".$config_databaseTablePrefix."products`";
    database_queryModify($sql,$result);
  }

...and that should do the trick!

Cheers,
David.

Submitted by chrisst1 on Wed, 2009-02-04 13:49

Hi David

Thanks for posting this mod, it has reduced import time considerably although big feeds (250k plus) are still hogging server resources. However we have found that if you make several copies of the offending feed and use the Import Range filter in batches of around 50k the problem has eased alot. Which leeds to my next question, can you put together a simple copy feed and duplicate code that we can use.

Chris

Submitted by support on Wed, 2009-02-04 17:00

Hi Chris,

Just to clarify; do you mean a tool to register a new feed exactly as per an existing feed (although you would have to change the merchant name) and copy all filters etc.?

Cheers,
David.

Submitted by chrisst1 on Wed, 2009-02-04 17:29

Hi David

Yes that sounds better than what I was thinking of. We usually download several copies of the same feed and label as merchant1 2 3 etc. Merchant mapping would be useful in this case.

Chris

Submitted by support on Thu, 2009-02-05 11:05

Hi Chris,

I'll follow up by email with a modification for you to try...

Cheers,
David.

Submitted by Keeop on Sat, 2009-02-07 12:35

Hi David,

Finally got round to running all this last night and can safely say it works a treat, so thanks for that. I did make one change, which I hope you can validate, and that was to change the DELETE and associated execute statement from admin/import.php, as follows:

    global $importALL;
    if ($importALL)
    {
      $table = "products_import";
    }
    else
    {
      $table = "products";
      $sql = "DELETE FROM `".$config_databaseTablePrefix.$table."` WHERE merchant='".database_safe($admin_importFeed["merchant"])."'";
      database_queryModify($sql,$insertId);
    }

.....just to only call it when doing a full import. Does this make sense?

Cheers.
Keeop

Submitted by support on Sat, 2009-02-07 14:42

Hi Keeop - yes that makes sense - it's unnecessary if using the temporary table.

Cheers,
David.

Submitted by Alex on Mon, 2018-07-16 08:01

Hi David,

I would also like to do the import to a temp table but i would like to include the uidmap process before drop/rename step within the import.

Do you have a solution for that?

Submitted by support on Mon, 2018-07-16 13:19

Hi Alex,

It's no problem to remove the table swap-over from the end of cron.php and modify uidmap.php to operate against `products_import` instead of `products` and then do the swap-over / reviews import at the end.

To give this a go, edit scripts/cron.php and look for the following code beginning at line 101:

  $sql = "DROP TABLE `".$config_databaseTablePrefix."products`";
  database_queryModify($sql,$result);
  $sql = "RENAME TABLE `".$config_databaseTablePrefix."products_import` TO `".$config_databaseTablePrefix."products`";
  database_queryModify($sql,$result);
  print chr(13)."backfilling reviews... ";
  admin_importReviews();
  print chr(13)."backfilling reviews...[done]\n";

...and DELETE (as this will now be done in uidmap.php)

Then edit scripts/uidmap.php and using your text editor's Search and Replace function as follows;

Search:

`".$config_databaseTablePrefix."products`

Replace:

`".$config_databaseTablePrefix."products_import`

(3 instances)

Finally, look for the following code at line 87:

  print "Done.\n";

...and REPLACE with:

  $sql = "DROP TABLE `".$config_databaseTablePrefix."products`";
  database_queryModify($sql,$result);
  $sql = "RENAME TABLE `".$config_databaseTablePrefix."products_import` TO `".$config_databaseTablePrefix."products`";
  database_queryModify($sql,$result);
  print chr(13)."backfilling reviews... ";
  admin_importReviews();
  print chr(13)."backfilling reviews...[done]\n";
  print "Done.\n";

Cheers,
David.
--
PriceTapestry.com

Submitted by Alex on Mon, 2018-07-16 17:51

Hi David,

thanks for the quick reply (as always :)
Could it be that the changes for the line 260 in includes/admin.php are on other lines in the most recent version? (Can't find the line with $sql = sprintf etc etc

And can you check for the other files that need to be edited?

Submitted by support on Mon, 2018-07-16 18:04

Hi Alex,

Zero down-time import (for cron.php / import.php @ALL) has been in the distribution since 13/03A so the only changes required are those to scripts/cron.php and scripts/uidmap.php (from Automatic Product Mapping by Unique ID) as described above...

Cheers,
David.
--
PriceTapestry.com