You are here:  » Cron not doing import


Cron not doing import

Submitted by Brice on Fri, 2018-01-19 19:05 in

I have gotten the cron.php to automatically update the feed management so all feeds show a Modified status that is in line with that.

However, it does not do the import process. After reading the last few days it is my understanding that the cron.php should handle it all but if not I could run the import.php.

So far no success on importing.

I am doing the same thing I did for cron.php but just with import.php:

/scripts/import.php"

then tried

/scripts/import.php @MODIFIED"
/scripts/import.php @ALL"

Am I missing something?

Submitted by support on Sat, 2018-01-20 09:32

Hello Brice,

cron.php certainly should be running import - it does the equivalent of fetch.php @ALL followed by import.php @ALL - the only thing I am wondering is if the process is being timed out by your server, or perhaps the process being terminated which can happen as importing is a relatively resource intensive process.

Don't forget that you can monitor progress from the /admin/ home page - you should see the feed Modified date / times update until all have been refreshed, and then you should see the Imported date / times update.

If you suspect process termination what normally does the trick is to insert a 1 second sleep every 100 products during the import as described in this comment which reduces load average considerably.

I know that you have been making various modifications but anything database specific that may have stopped import from working should also affect importing from /admin/. If that's stopped working also, then you can enable database debug mode to view any MySQL error messages occurring by changing line 6 of config.advanced.php as follows;

  $config_databaseDebugMode = TRUE;

However, depending on server configuration it may not be possible to capture errors occurring during import as the page redirects upon completion, so it may be necessary to modify includes/database.php to exit() after displaying the error. To do this, look for the following code at line 56:

      print "[".$sql."][".mysqli_error($link)."]";

...and REPLACE with:

      print "[".$sql."][".mysqli_error($link)."]";exit();

If you're still not sure, if you could confirm the actual cron.php command line that you are running and I'll check it out further with you...

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Sat, 2018-01-20 17:28

I added the 1 second sleep and that did not make the cron function when it came to import time.

I removed that 1 second sleep and tried the "import" button on the feed management screen in the admin section and it imported 3760 products. Note the file had 10,111 products.

I then did a "SLOW import" in Admin and it showed it processing up to 10,111 but when it went back to the feed management screen in Admin it showed that feed had 3760 products imported and the import time was reflecting the current import time.

Tried a manual import in the feed management for a different feed that has a total of 7,123 products and it imported 6,835 products. While a SLOW import went through processing to 7,123 but in the end it produced a final import number of 6,835.

I didnt realize the full import wasnt coming through until this troubleshooting.

Submitted by support on Sat, 2018-01-20 17:41

Hello Brice,

Just a quick note - I will be back online properly as normal from Monday morning but bear in mind the validity requirements for a record to be imported; each product must have valid:

Product Name
Buy URL
Price

...and also be unique to that merchant, so if there are two products with the same name for the same merchant, the second would not be imported (but this can be disabled as described in this thread).

You can check why a feed has imported less products than expected with the Parse Analysis tool - from /admin/ go to Tools > Feed Utilities and click Parse Analysis next to a feed that you want to inspect. A trial parse will be performed, and the count of reasons for any products that would not be imported will be displayed, and in the case of duplicates, a list of duplicate products and count...

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Sat, 2018-01-20 19:38

I performed the Parse Analysis as you mentioned and now I see the true issue. So I Parsed the file that had 10,110 products and the result was:

Duplicate Names 6304
Valid Records 3806

So import via feed manager manual button seems to work.

However, just my luck, I have a new issue to contend with on top of the others.

6304 Duplicate names is correct but technically not duplicate products.

There appear to be multiple products that have the same name but they are different variations of the product. The distinguishing aspect is the upc code, mpn, buy link are different but the title remains the same for these products.

Example, RugRat 556 as the title may result in 3 products.

Title: Rugrat 556
Product 1: is the 2018 model
Product 2: same title is the 2015 model
Product 3: same name but a style

All these products have the same title but different UPC Codes. I investigated deeper and found out the merchant actually truncates the distributor feed because the titles they provided are not pleasant to the eye. So the feed they provide to us is nicely done but since they used titles that are same and let the upc code distinguish the difference then that results in duplicate values at import eliminating those products from import.

Furthermore, there appears to be a large number of merchants doing this as I look ahead in the feeds to come.

Great tool by the way. Saved me a lot of my weekend time investigating that issue.

I am sure you have a work around and probably some things for me to try on Monday for the cron issue.

Thanks!

Submitted by support on Mon, 2018-01-22 09:27

Hello Brice,

I'll follow up in the thread regarding mapping about UPC. Regarding cron.php, could you confirm the full cronjob command line that you are using to invoke cron.php, and also confirm that when run, you are seeing all Modified date/times updated, but none of the Imported date/times update?

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Mon, 2018-01-22 15:56

You are correct, I see the Modified date and times and NOT the Imported times.

Cron is as followed:

/web/cgi-bin/php5 "$HOME/html/ARM/scripts/cron.php"

Doing this through a test server in the Cron Job Manager and is set to run every hour.

Submitted by support on Mon, 2018-01-22 17:12

Hello Brice,

I wouldn't recommend having cron.php set to run hourly on a test server, especially if you are also experimenting with uidmap.php on the same server but not as part of the cronjob - that could easily leave the database in an inconsistent state if the processes end up running at the same time...

Ideally, whilst you are working with uidmap.php outside of a cronjob, don't have a cronjob running at all - use Import All (manually from /admin/ ), followed by your manual / testing invocation of uidmap.php.

But to answer this in isolation, I notice that your cronjob command line does not have a change directory to the scripts/ folder as per the suggested command line derived by the script (from /admin/ go to Setup > CRON) where an example Option 1 command line might be;

cd /home/username/scripts/;/usr/bin/php cron.php

If you had not tried the Option 1 command line, or it is not available, then based on the command line you posted I would suggest:

cd $HOME/html/ARM/scripts;/web/cgi-bin/php5 cron.php

And then for later, once everything set-up and you want to run cron.php followed by uidmap.php it would be;

cd $HOME/html/ARM/scripts;/web/cgi-bin/php5 cron.php;/web/cgi-bin/php5 uidmap.php

Cheers,
David.
--
PriceTapestry.com

Submitted by Brice on Mon, 2018-01-22 18:52

Sorry I should have added the detail its not a command line entry per say I attached. The host provides a Cron Manager. This a copy of the instructions from that manager:

The installed versions of PHP 5 do not support the interpreter specification in a PHP file. In order to run a PHP script via Cron, you must specify the interpreter manually. For example:
PHP 5.3: /web/cgi-bin/php "$HOME/html/test.php"
PHP 5.2: /web/cgi-bin/php5 "$HOME/html/test.php"
Note: In this example script, "$HOME" represents the full path to your Shared Hosting account. The actual path to your account will be provided if you select the script from your account using the Browse button.

I click the browse button and manually select the script directory and the file to get the result of:

/web/cgi-bin/php5 "$HOME/html/ARM/scripts/cron.php"

I have been very careful to not overlap the running of cron and only manually engaging uidmap after disabling the cron.

ran the test using cd $HOME/html/ARM/scripts;/web/cgi-bin/php5 cron.php. It has updated as usual. I can see Modified has change date and times. The Imported section remains still unchanged.

Submitted by support on Mon, 2018-01-22 19:21

Hello Brice,

I'd like to follow up by email if that would be OK as it would be much easier to use actual rather than examplified code in this case and in conjunction with uidmap.php experiments with reference data...

If that would be OK, if you could email me latest modified uidmap.php i'll check it out further with you...

Thanks,
David.
--
PriceTapestry.com