You are here:  » Robots.txt - What to block and why

Support Forum



Robots.txt - What to block and why

Submitted by paddyman on Tue, 2009-05-12 06:36 in

Hi David,

Have noticed in another thread that you are using the following robots.txt

User-agent: Mediapartners-Google*
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: *
Disallow: /categories.php
Disallow: /brands.php
Disallow: /reviews.php
Disallow: /category/
Disallow: /brand/
Disallow: /review/
Disallow: /admin/
Disallow: /search.php
Disallow: /jump.php

Just wondering why would you recommend blocking the category and brand folders. Is it for duplicate content reasons ?

Many thanks

Adrian

Submitted by support on Tue, 2009-05-12 08:38

Hi Adrian,

Rather than duplicate content, in my case it's more to save bandwidth as every product is ultimately linked to via the merchant index. This is because of my theory that a search engine will only allocate so many resources towards indexing a site, so I would rather it concentrate those resources on the product pages.

For a niche site; where search engines are likely to cover the entire site it may be better to allow unrestricted access as the category and brand index pages (generated by search.php of course) can do very well in search engines...

Duplicate content shouldn't be an issue; a search engine may see multiple links to the same page of course (for a product with a category and brand value there will be links from the merchant, category and brand indexes) but I don't believe that to be a problem at all.

Cheers,
David.

Submitted by paddyman on Wed, 2009-05-13 11:45

Thanks David.

Submitted by jonny5 on Mon, 2009-05-18 10:37

Hi David,

In google webmaster tools i have noticed i have a lot of Pages with duplicate title tags

these are all like

‎/merchant/Saltrock/1.html‎‎
/merchant/Saltrock/10.html
‎‎/merchant/Saltrock/11.html
‎‎/merchant/Saltrock/12.html
‎‎/merchant/Saltrock/13.html
‎‎/merchant/Saltrock/14.html
‎‎/merchant/Saltrock/15.html
‎‎/merchant/Saltrock/16.html‎

so its the list of all the items from a particular merchant , is there anything i can restrict in robots to stop this happening? or will this not make a difference?

Submitted by support on Mon, 2009-05-18 12:11

Hi,

I don't think it's a problem; but it would be straight forward to add the page number to the title so that the titles are unique. In search.php, look for the following code around line 154:

$header["title"] = htmlentities($q,ENT_QUOTES,$config_charset);

...and REPLACE this with:

$header["title"] = htmlentities($q,ENT_QUOTES,$config_charset)." Page".$page;

Cheers,
David.

Submitted by jonny5 on Mon, 2009-05-18 13:59

that will do great , many thanks

Submitted by jim on Fri, 2009-05-29 02:35

$header["title"] = htmlentities($q,ENT_QUOTES,$config_charset)." Page".$page;

Is there anyway to code this so the page number will only appear on page 2 upwards (ie. on page 1 it does not show "page 1")

and a different question:- how can you show something like (page 2/10) .. ie how many pages of searches there are

and finally, to combine the two above? (ie. nothing shows on the first page, but every other page has 2/10 3/10 etc

thanks David!

Submitted by support on Fri, 2009-05-29 09:02

Hi Jim,

Sure - replace the above line with:

$header["title"] = htmlentities($q,ENT_QUOTES,$config_charset);
if ($page > 1)
{
  $totalPages = ceil($resultCount / $config_resultsPerPage);
  $header["title"] .= " Page ".$page."/".$totalPages;
}

Cheers,
David.

Submitted by jim on Fri, 2009-05-29 10:57

perfect. thanks!