You are here:  » Pages not followed


Pages not followed

Submitted by Deanh01 on Sat, 2015-04-18 18:42 in

Hi David
Google has alerted me that 1500 pages can not be followed on one of my sites. So I clicked on several of the links and the page is blank. Here is a link {link saved} . What would cause this? is there a fix?

Thanks
Dean

Submitted by support on Sun, 2015-04-19 08:17

Hello Dean,

This looks like jump.php links have been indexed, and Google is re-crawling one for a product (by ID) that no longer exists.

I normally suggest excluding jump.php via robots.txt. Have a look at robots.txt.dist in the distribution for recommended exclusions - which are based on the idea of leaving the merchant index open to all robots, as well as product pages, however many users like to have category / brand pages indexed so it depends on your requirements / site scope but as a minimum, if you add or create the following robots.txt:

User-agent: *
Disallow: /jump.php

However, that won't immediately remove existing jump.php links that have found their way into search engines, so one thing you can do is to have it return 410 (Gone) if the ID no longer exists. To do this, edit jump.php and look for the following code at line 6:

  database_querySelect($sql,$rows);

...and REPLACE with:

  if (!database_querySelect($sql,$rows))
  {
    header("HTTP/1.1 410 Gone");
    exit();
  }

Hope this helps!

Cheers,
David.
--
PriceTapestry.com

Submitted by Deanh01 on Sun, 2015-04-19 20:40

Hello David
I had to go back to the previous price tapestry to find that code on line 6. I used it and made the changes but it still comes up a blank page.
I checked my robots.txt and I had

User-agent: Mediapartners-Google*
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: *
Disallow: /categories.php
Disallow: /brands.php
Disallow: /reviews.php
Disallow: /category/
Disallow: /brand/
Disallow: /review/
Disallow: /admin/
Disallow: /search.php
Disallow: /jump.php

That should have stoped the crawler from indexing it. Right?

Thank You
Dean

Submitted by support on Mon, 2015-04-20 08:37

Hi Dean,

The changes to jump.php won't affect the visual result of viewing the link, but in the background the HTTP 410 (Gone) header will have been sent which should cause a search engine to remove the page from its index.

robots.txt look fine so yes, this should prevent crawling of /jump.php (for a site installed in the top level of the site) so the warning may just be the general warning that WMT reports which is the number of links known about on your site but restricted by robots.txt - I think they provide this as a double check mechanism to make sure that you're not inadvertently blocking something through robots.txt that you actually do want indexed. If you're still not sure of course let me know and I'll check it out further with you...

Cheers,
David.
--
PriceTapestry.com