You are here:  » duplicate content, grid view, centralising databses...

Support Forum



duplicate content, grid view, centralising databses...

Submitted by Paul1107 on Tue, 2009-03-03 15:25 in

Hi David,

being AWOL lately, manily due to lots of stuff going on in 9-5 world, however, wondered whether you can help me with a couple of issues that I need to get resolved, more of a medium term issue as I've only limited time I can assign to this...

Anyway, as one of my sites has been hit by what I think is a content duplication penalty by Google and has dropped from page 2 to page 45, but is now hovering at around p25 so not good (still p1 in yahoo). So I think the culprit maybe the PT list view which I am currently employing, which has a full product description included, I therefore thought that I probably need to change to the grid view, to only have an image, title and perhaps a price, but no description, would that seem feasable or am I barking up the wrong tree?.

However before I do that I really would like to centralise my database so several sites can use it. Can you point me in the right direction on that point please, as last time I tried it didn't work using a thread I found on this forum and had to revert to previous?

One further thing which is only an idea, and one I would like to get your feedback on is the following: a lot has been written lately about on A4U forum about ECU's (easy content units) or product widgets that some of the networks offer, they are very handy in a blog just to drop into a post relating to the products you're writing about, but it struck me that your PT programme would be great if able to be adapted but with the benefit of having cross network products via your own data feeds and more customisable, In your opinion, could Price Tapestry be adapted to maybe do a similar version of it? would you think that is possible?

Sorry for waffling on a little too much, as always appreciate your help

Paul

Submitted by support on Wed, 2009-03-04 09:06

Hi Paul,

Regarding duplicate content - are you using robots.txt to limit indexing of any part of your site. I set my sites up so that the /merchant/ index is the only crawlable route to every product page, for an example, have a look at the robots.txt on the demo site...

http://www.webpricecheck.co.uk/robots.txt

However, there is a lot of personal opinion with regards to whether or not this is the best approach. Many users have seen good results by going to the opposite extreme and log search results and generate sitemaps for /search.php?q=Keyword etc.

If you want to remove the description content from the search results, you can simply remove the following code from line 20 of html/searchresults.php:

<p><?php print substr($product["description"],0,250); ?></p>

Alternatively, if you want to display search results in a grid using a similar style to the featured products drop me an email and i'll send you the alternative version of that file...

Centralising the database is really just a case of having one "master" Price Tapestry installation, and then using the same $config_database... settings for any other installations that you wish to share the database with - it should be as simple as that. If you're having trouble setting this up, let me know how you have set things up and what problems or errors you are seeing and I'll look into it for you.

Regarding "Easy Content Units", have you come across the external display scripts that I have written which enable you to show search results or product price comparison tables...

http://www.pricetapestry.com/node/2289

I am currently working on a more comprehensive version of this, effectively enabling you to run the whole of Price Tapestry within a page on another site running any CMS such as Joomla or WordPress...

Cheers,
David.

Submitted by Keeop on Wed, 2009-03-04 09:42

Hi Paul,

It wasn't a new site was it? If so, this is what Google seems to do. Once it first spots your site, it seems to place it quite highly in the search rankings, possibly to judge what sort of traffic it gets at that position. Depending on the percentage of traffic, it will then re-rate your site and place it accordingly in the search result rankings. Once this has completed, the site will then move up or down depending on the hits it gets, number of inbound links, page rank, and the other algorithms Google uses.

I don't work for Google so am not 100% sure that this is correct, but I have read a lot of similar posts on the web relating to this sort of thing happening and the general consensus as to why it does, is that above. I've had the same happen to my new sites as well. There are arguments for and against as to whether to register your site with the Google Webmaster tools for this reason and if you do register it, only do it when you are 100% happy with the site as this is when you'll get the boosted search rankings.

As for Yahoo, they use different algorithms and search page ranking seems to be based much more on the relevance of your domain name to the search keyword. I, like you, can have a page indexed between 10 and 20 on Google yet on the first page of Yahoo.

Cheers.
Keeop

Submitted by Paul1107 on Wed, 2009-03-04 18:54

Hi David,

Thanks for your reply, I did set the robots.txt upon installation as per the instruction, but obviously not that well as not so long ago I had several thousand pages indexed, however this problem that I have is an assumption, as with Google you don't quite know what the hell is wrong with your site, so its a process of elimination, and duplicate content was the only reason I could think of, as I've haven't used any black hat tricks or anything like that.

I'll have to look at the robot.txt again, however as the main competitors to my site all use a similar format of image,title, price displayed in a grid view, I was thinking that perhaps killing two birds with one stone? I have the grid view file also, so unless there is a newer version available I am OK for the grid view thanks.

I tried to centralise the database in Jan, I think perhaps at around about the time you were on holiday, didn't work so reverted to original, I will re-trace my steps and give you a shout if necessary.

The external display script is the script I use exclusively in my site, so I am familiar with it, but was thinking more about having a limited display of product, say 9 or 12 which would upon clicking go through to the merchant, at the mo the external search script results runs into several pages as you were kind enough to give me the code for pagination, so perhaps to achieve what I want, I just have to extracr=t the pagination, right?. Look forward to seeing your new script.

Sorry for the long reply

cheers

Submitted by Paul1107 on Wed, 2009-03-04 19:01

Hi Keeop,

Thanks for your input, The site I refer to is NOT a new site.

I am familiar with the new site effect with Google, but the site in question was several months old at least. However I was told that until the site gets to a certain level of daily traffic, I should avoid data feeds, and too many affiliate links.

Perhaps its just a case of putting the datafeeds on too early, maybe Its just a question of changing the robots.txt, but maybe Google has just identified the site as an affiliate site and has penalised it?

The problem is I don't really know and not expert enough to ascertain that mayself.

once again thanks for your input

Paul

Submitted by Keeop on Thu, 2009-03-05 13:05

Hi Paul,

I don't know about putting feeds on too early. I haven't noticed that specifically causing me problems but I'll keep an eye on the last couple of sites I made live to see if the same happens to them.

There may be issues with feeds in general, especially as they can now be loaded straight in to Google via their 'Base' offering. This could potentially cause a massive amount of duplication and issues but it all depends on how 'Base' works I guess.

As for all the affiliate links, again, may be an issue. As long as you are using masked urls, robots.txt and 'noindex, nofollows' on all the links, then they should be OK. There might be a problem if you have loads of banners as the spiders will be able to tell them from the image urls. This may be an issue for the images in the data feeds thinking about it. Maybe it might be an idea to treat the images like the urls and mask them? I've got a different script/system that does that on some of my sites but I don't think it makes much difference. What do you reckon?

Cheers.
Keeop