How do I Handle Content Scrapers? Can They Hurt My Rankings?

By Daniel Scocco

questions and answersThis post is part of the weekly Q&A section. Just use the contact form if you want to submit a question.

Arun Basil asks:

Daniel,
Recently I had been getting some backlinks to my articles from sites that look like genuine sites. These backlinks comes within about 2/3 hours of posting content. My blog is not a very popular blog, so i dont think that the guys found my latest posts anywhere online (like Google or Social sites). I used to think that, these bloggers would have found my posts while random surfing. But then, these things happen very often now, that too from different sites. And the good thing is that, these sites publish only excerts from my blog and a link to the main article. But I do not get visitors from any of these sites.

My questions are:
1. How did they find my post within 3 hours of posting the content?
2. Will backlinks from spam sites affect my rankings?
3. Should I ask them to remove links to my site?
4. If someone else publishes excerpts from my blog, will Google consider them as copies of the same content?
5. These sites have page ranks of 0 or 1. Will link backs from such sites help me improve my PR..?

It looks like you are talking about content scrapers. Those are people that create websites on specific niches, and for the content part they just scrape other blogs or sites around the web. One method to scrape that content is via the RSS feed of blogs. There are many plugins and scripts that will automatically grab an RSS feed and output its content as new blog posts.

Scrapers who republish 100% of the content that they find on other websites are obviously violating copyrights, and you could try to bring them down. Scrapers that only republish excerpts, however, are probably protected under the “fair use” clause, so there isn’t much you can do about them (except forcing them to link back to you, as I will show below).

Now let’s answer the 5 questions.

1. How did they find my post within 3 hours of posting the content?

As I mentioned before, it is likely that those pseudo blogs simply added your RSS feed to their script, so every time you publish a new post they will get notified about it, and the script will automatically write about your post on the scraping blog (either with an excerpt or with the full content).

2. Will backlinks from spam sites affect my rankings?

If you mean affect your rankings negatively, the answer is no. External links will almost never hurt your search rankings. This is a necessary measure for Google and other search engines, else it would be too easy to sabotage competing websites.

Notice that I said “almost never,” however, because under some situations the external links could end up hurting a site’s ranking. But here I am talking about elaborate linking patterns that have the purpose of simulating the manipulation of Google’s index or spam activities. In order words, this would only happen if you have an expert SEO trying to hurt your rankings deliberately, and not as a result of content scrapers.

Linking out to bad neighbor and spam websites can hurt you a lot, though, so keep an eye for the pingbacks and trackbacks that those sites will send to you.

3. Should I ask them to remove links to my site?

As long as those links are not generating pingbacks and trackbacks, I wouldn’t worry too much about them. In fact there are some chances that those links might be passing link juice to your site and helping with your search engine optimization.

Secondly, those links are also good to help Google identify what is the original source of the content. Making sure that scraping sites will link back to the original post is therefore a method to protect your site from search penalties.

If you want to make sure that people scraping your RSS feed will link back to your original post, you just need to use the RSS Footer plugin.

4. If someone else publishes excerpts from my blog, will Google consider them as copies of the same content?

No. Google’s definition of duplicate content is: “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”

Excerpts are obviously not substantive blocks of content.

5. These sites have page ranks of 0 or 1. Will link backs from such sites help me improve my PR..?

Possibly. It depends on the number of links that those sites will send to you, on whether or not the links are nofollowed, and on the overall quality and relevancy of those websites.

Don’t expect to get a huge PR boost from scrapers though.

Monetize Your Site




Share

35 Responses to “How do I Handle Content Scrapers? Can They Hurt My Rankings?”

  • Barbara Ling, Virtual Coach

    RSS footer is a great plugin, I use it myself as well. I also use resources like http://www.copyscape.com and http://plagiarism.phys.virginia.edu/ to find where my content is duplicated too.

    Data points, Barbara

  • Rarst

    From my experience such scrapers don’t really target specific blogs, they rely on third party services such as Technorati that track posts and combine headlines by topics.

    I just ignore such and kill trackbacks.

  • joe comp

    i don’t know about rss footer because i don’t use wordpress.this week google is gonna update Page Rank again.and my blog is one which drop πŸ™

  • Rich

    Another informative Q and A here, Daniel. Incidentally, Google Adsense has just reminded its publishers to report to them if they found a site is illegally copying contents and scraping.

    And of course, this sites should also have Google Adsense on them.

    Here is the link to their blog just in case someone needs it – http://adsense.blogspot.com/2009/02/my-content-your-content-other-peoples.html

  • Mayank

    Hey Daniel – that indeed is a nice and informative article that clears up many doubts many newbies or amateur blogger have. Content scrapers have always been a pain but they really don’t hurt if one stays alert about things happening around the blog.

  • eleena

    Are you still working your way through all the questions you received under your previous Q&A format or are you just selecting general questions that you believe are of interest?

  • Daniel Scocco

    @Rich, thanks for sharing.

  • SEO Tips

    Excellent article, nice Q&A very informative.

  • Calvin Loh

    If you use RSS Footer, remember to also state the Terms and Conditions of using your RSS Feeds clearly on your blog. In other words – don’t just say “Don’t republish my blog posts.” in RSS Footer, make sure you state this in your Terms and Conditions on an easily found page in your blog.

    Remember that RSS was designed to let other publishers grab your stuff easily. That’s why it is called Really Simple Syndication. When you push out your content on RSS, you are implicitly saying “Here I am! Publish me! Publish me!”

  • Marita

    Scrapers can find a site relevant to their niche very easily and quickly by using Google Alerts.

    But so can you! If you want to identify possible scraper sites, set up a Google Alert with the main keywords of your blog (incl. your domain) and Google Alerts will spit out a list of all pages using these keywords, which will also include scraper sites.

    http://www.google.com/alerts

  • Make Money Online

    Interesting article, I have been noticing more and more of these scraper sites sometimes within hours of me starting a new site!

  • Arun Basil Lal

    Daniel,

    That clears the mist I had in my mind regarding such spammers. As you said, some trackbacks came through Akismet and I never cared about them. Also, the plugin you told about is a defenite work-around.

    And yes, as you said, the sites that publish excerts is sending in some traffic.

    (btw, this was a surprise for me, I had asked this Q a long time back. Thanks daniel! )

    @Calvin Loh: I disagree with you. First RSS doesnt expand as Really Simple Sindication. Its Rich Site Summary. (Refer: http://www.whatisrss.com/ )
    Second, I dont think RSS was designed for other publishers, it was meant for readers who would like to know when their favorite site is updated. I would call RSS and Reader Subscription Service!

    @Marita: I have been using Google alerts too. I use All-in-one SEO pack to make sure that all my titles have the sitename. But I have never got an alert for a copied page even if they have the name of the site in them. Maybe Google is not indexing them anymore.

    Cheers

  • diabetes man

    thanks…… sharing about link building information, well from your explanation….external link is not bad on se eyes…..

  • Sam Duvall

    If there is just couple sites, then you could try blocking their servers ip adresses from accessing your site/feed.

  • Randy

    It answers my questions about those stupid scrapers. Also, thanks for mentioning Yoast’s footer plugin.

    I still have one question though. How can I take down the scraper blog. In my case they are publishing the whole post and not just an excerpt. What should I do about it?

    TIA Daniel!

    -Randy

  • Daniel Scocco

    @Randy, you need to send them or the hosting company a DMCA, and if necessary get lawyers in the middle.

  • Arun Basil Lal

    @Randy:

    I have some light into that question on how to take down a blog here: http://www.millionclues.com/tutorials/fight-copyright-infringers

  • Calvin Loh

    @Arun

    Thanks for the link. I did some further checking: the latest specs (RSS 2.0) hosted by Harvard (at http://cyber.law.harvard.edu/rss/rss.html) says it is Really Simple Syndication. Ditto for the main Wikipedia entry.

    But the very first spec, 0.9.0 called it RDF Site Summary while the 2nd spec, 0.9.1 said RSS is a name, not an acronym πŸ™

    I stopped tracing the history after that πŸ˜‰

    Regardless, my main point stands – as a publisher, if you want to protect yourself:
    1) only publish excerpts
    2) make sure you publish the Terms and Conditions of using your RSS feed on your blog/website
    3) use the RSS Footer plugin

    @Sam Duvall

    To easily block a scraper, try this plugin: WP Block You. It lets you add the scraper’s IP address to your blog’s .htaccess file.

    OTOH, if he is only publishing your excerpts and you don’t want to give him the trackbacks, try the Simple Trackback Validation (this plugin checks to make sure the blog post making the trackback has a permalink) and Moderate Trackbacks plugin (sends all trackbacks to the spam queue for you to moderate).

    Warning: I’ve only used Simple Trackback Validation, not the others. Honestly, since I’m also running Akismet and WP Spam Free at the same time, I don’t know if Simple Trackback Validation actually does anything.

    Copyblogger has a guest post called “Confessions of a Trackback Spammer…” which actually goes into further detail.

  • Bacterial Diseases

    I see these things as a bonus. I mean what your really looking at is free backlinks and your content having more chances to be seen

  • Tom – Stay At Home Business

    Good post! There will always be people who want to piggy-back on others who write original and useful content.I would not worry about it too much since you are the writer of the original article and you cannot possibly be responsible if someone decides to scrape your content and put it up on their blog.

  • Rahul Jadhav

    Hey Daniel i had sent you a link of a blog which was scrapping your blog content. Did you contact him??

  • Rahul Jadhav

    Hi Daniel, I have a Q.

    You have said before that using paid links is against Google policy and we get a PR penalty for it. However I still see lots of blogs using Paid Blog Reviews. Isnt it similar to paid links if not same. You pay for a blog review and you get a link to your site in the post. What is your opinion??

  • Randy

    @Daniel Thanks! The RSS Footer pugin is working fine for me. I think there’s no need for a DMCA for now.

    @Arun Thanks for the link.

  • Niche

    Interesting. As long as they include all my links and send me some backlinks, who cares. If I am indexed first, no risk to my PR and more traffic all round

    Sounds like a win win to me

  • Daniel Scocco

    @Rahul, not yet.

  • Dean Saliba

    Very good article.

    I think most of us have fallen victim to these scrapers at some time and it is bloody annoying.

    Good to see that if they link to my blog I might get better results in search engines and potential to get a better page rank.

  • Ajay

    From experience I can tell you that a content scrapper can rank higher than your site using your own content. Google rank is not only determined by who posts first and who owns it but also by how relevent that content is to your entire website/blog. If you write about normally about “wine tasting” and suddenly post about “cars” and a content scrapper picks that up (and if the content scrapper) has a specialized blog on “cars”, you may find him rank higher than you. Although if your own post has a lot of comments and tackbacks, the chances of that are reduced…the point I am making that it is possible.

    thanks,

  • Tyrone

    Nice Q&A, The article is really very informative especially for the beginners.

  • Web Designing Quotes

    Content scrapers have always been a pain but they really don’t hurt if one stays alert about things happening around the blog.

  • medyum

    Interesting article, I have been noticing more and more of these scraper sites sometimes within hours of me starting a new site!

  • Cherran

    It depends who copies your content. If Google finds the site that copies the content is a trusted site, you are doomed. Frustratingly yes.. that’s true… here is the example…..

    For the query “difference between plant cell and animal cell”
    The fifth results is http://wiki.answers.com/Q/Difference_between_an_animal_cell_and_a_plant_cell

    This is a user generated content which is an exact copy of this article
    http://www.differencebetween.net/science/difference-between-animal-and-plant-cells
    but the original article is no where to be found in the SERPS. (Actually it dropped after the duplicate content posted and found by Google)

    The original article was published on July 16 2009. The current Answers.com answer was copied and pasted on 11 Aug 2009. Until the Answer is posted the original article ranked well but after the answer was foud by Google the original article dropped out of the results.

    It is very unfair to treat duplicate content on the basis of trusted sites. Answers sites are like parasites if they could kill the host, the ecosystem will perish.

    It is unfair on many fronts one is a person’s hardwork is manipulated by others. and a competitor can be easily kicked out of the SERPs by simply copying his popular post and publish it on answers site Yahoo answers or Answers.com.

    It is unfair and unethical and I am so mad about that.
    Any thoughts on handling this issue?

  • organic chemistry

    The main issue here is if the scraping website owners blog is higher pagerank than yours or higher in backlinks.. If this is the case then it can negatively effect your site as the search engines will think higher backlinks means more trust therefore the original owner..
    The link back is the solution, however so many scrapers do not put the link back as well so that can be an issue
    Overall it can be a very frustrating situation when it happens.

  • Calvin

    To “organic chemistry” (comment #32),

    Officially, Google says that if the copier links back to the source, the source will be credited. However, let me relate my experience:

    I used to copy my blog posts to ezinearticles.com, with a link back to the original. After all, this is one of the recommended actions by Chris Knight (owner of EZA) and a few other IM gurus as well.

    Guess what: when searching for the keyword in Google, the copy on EZA can be found, but not the original on my blog.

    I do not know if Bing and Yahoo do any better than Google regarding this issue. Never had the time to test, and can’t be bothered either.

  • Bill Bolmeier

    Great info. What happened to me was one of my articles got linked to by a big site which drove tons of short term traffic to my blog and 6 backlinks.

    When I visit all 6 backlink sites, they are the scraped page from the big site that linked to me.

    3 of them came into my comments area for me to approve, 3 went to spam comments. I don’t see any difference between the ones marked as spam and the ones not marked as spam.

    I’ll keep searching and reading. I’m googling each site because sometimes other folks have reported those sites as spam.

    And then I’m wondering if the big site that linked to me doesn’t mind that they get scraped. Some site don’t seem to mind.

    I’d love to approve all of them for the backlink juice but I’m unsure about whether it will hurt or help or not.

  • Best Netbooks

    Thanks for clearing that up Daniel I have often wondered the same things myself. Although Chercen’s point above is worrying.

Comments are closed.