How do I Handle Content Scrapers? Can They Hurt My Rankings?

By Daniel Scocco

questions and answersThis post is part of the weekly Q&A section. Just use the contact form if you want to submit a question.

Arun Basil asks:

Daniel,
Recently I had been getting some backlinks to my articles from sites that look like genuine sites. These backlinks comes within about 2/3 hours of posting content. My blog is not a very popular blog, so i dont think that the guys found my latest posts anywhere online (like Google or Social sites). I used to think that, these bloggers would have found my posts while random surfing. But then, these things happen very often now, that too from different sites. And the good thing is that, these sites publish only excerts from my blog and a link to the main article. But I do not get visitors from any of these sites.

My questions are:
1. How did they find my post within 3 hours of posting the content?
2. Will backlinks from spam sites affect my rankings?
3. Should I ask them to remove links to my site?
4. If someone else publishes excerpts from my blog, will Google consider them as copies of the same content?
5. These sites have page ranks of 0 or 1. Will link backs from such sites help me improve my PR..?

It looks like you are talking about content scrapers. Those are people that create websites on specific niches, and for the content part they just scrape other blogs or sites around the web. One method to scrape that content is via the RSS feed of blogs. There are many plugins and scripts that will automatically grab an RSS feed and output its content as new blog posts.

Scrapers who republish 100% of the content that they find on other websites are obviously violating copyrights, and you could try to bring them down. Scrapers that only republish excerpts, however, are probably protected under the “fair use” clause, so there isn’t much you can do about them (except forcing them to link back to you, as I will show below).

Now let’s answer the 5 questions.

1. How did they find my post within 3 hours of posting the content?

As I mentioned before, it is likely that those pseudo blogs simply added your RSS feed to their script, so every time you publish a new post they will get notified about it, and the script will automatically write about your post on the scraping blog (either with an excerpt or with the full content).

2. Will backlinks from spam sites affect my rankings?

If you mean affect your rankings negatively, the answer is no. External links will almost never hurt your search rankings. This is a necessary measure for Google and other search engines, else it would be too easy to sabotage competing websites.

Notice that I said “almost never,” however, because under some situations the external links could end up hurting a site’s ranking. But here I am talking about elaborate linking patterns that have the purpose of simulating the manipulation of Google’s index or spam activities. In order words, this would only happen if you have an expert SEO trying to hurt your rankings deliberately, and not as a result of content scrapers.

Linking out to bad neighbor and spam websites can hurt you a lot, though, so keep an eye for the pingbacks and trackbacks that those sites will send to you.

3. Should I ask them to remove links to my site?

As long as those links are not generating pingbacks and trackbacks, I wouldn’t worry too much about them. In fact there are some chances that those links might be passing link juice to your site and helping with your search engine optimization.

Secondly, those links are also good to help Google identify what is the original source of the content. Making sure that scraping sites will link back to the original post is therefore a method to protect your site from search penalties.

If you want to make sure that people scraping your RSS feed will link back to your original post, you just need to use the RSS Footer plugin.

4. If someone else publishes excerpts from my blog, will Google consider them as copies of the same content?

No. Google’s definition of duplicate content is: “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”

Excerpts are obviously not substantive blocks of content.

5. These sites have page ranks of 0 or 1. Will link backs from such sites help me improve my PR..?

Possibly. It depends on the number of links that those sites will send to you, on whether or not the links are nofollowed, and on the overall quality and relevancy of those websites.

Don’t expect to get a huge PR boost from scrapers though.



Share

35 Responses to “How do I Handle Content Scrapers? Can They Hurt My Rankings?”

  • Best Netbooks

    Thanks for clearing that up Daniel I have often wondered the same things myself. Although Chercen’s point above is worrying.

  • Bill Bolmeier

    Great info. What happened to me was one of my articles got linked to by a big site which drove tons of short term traffic to my blog and 6 backlinks.

    When I visit all 6 backlink sites, they are the scraped page from the big site that linked to me.

    3 of them came into my comments area for me to approve, 3 went to spam comments. I don’t see any difference between the ones marked as spam and the ones not marked as spam.

    I’ll keep searching and reading. I’m googling each site because sometimes other folks have reported those sites as spam.

    And then I’m wondering if the big site that linked to me doesn’t mind that they get scraped. Some site don’t seem to mind.

    I’d love to approve all of them for the backlink juice but I’m unsure about whether it will hurt or help or not.

  • Calvin

    To “organic chemistry” (comment #32),

    Officially, Google says that if the copier links back to the source, the source will be credited. However, let me relate my experience:

    I used to copy my blog posts to ezinearticles.com, with a link back to the original. After all, this is one of the recommended actions by Chris Knight (owner of EZA) and a few other IM gurus as well.

    Guess what: when searching for the keyword in Google, the copy on EZA can be found, but not the original on my blog.

    I do not know if Bing and Yahoo do any better than Google regarding this issue. Never had the time to test, and can’t be bothered either.

  • organic chemistry

    The main issue here is if the scraping website owners blog is higher pagerank than yours or higher in backlinks.. If this is the case then it can negatively effect your site as the search engines will think higher backlinks means more trust therefore the original owner..
    The link back is the solution, however so many scrapers do not put the link back as well so that can be an issue
    Overall it can be a very frustrating situation when it happens.

  • Cherran

    It depends who copies your content. If Google finds the site that copies the content is a trusted site, you are doomed. Frustratingly yes.. that’s true… here is the example…..

    For the query “difference between plant cell and animal cell”
    The fifth results is http://wiki.answers.com/Q/Difference_between_an_animal_cell_and_a_plant_cell

    This is a user generated content which is an exact copy of this article
    http://www.differencebetween.net/science/difference-between-animal-and-plant-cells
    but the original article is no where to be found in the SERPS. (Actually it dropped after the duplicate content posted and found by Google)

    The original article was published on July 16 2009. The current Answers.com answer was copied and pasted on 11 Aug 2009. Until the Answer is posted the original article ranked well but after the answer was foud by Google the original article dropped out of the results.

    It is very unfair to treat duplicate content on the basis of trusted sites. Answers sites are like parasites if they could kill the host, the ecosystem will perish.

    It is unfair on many fronts one is a person’s hardwork is manipulated by others. and a competitor can be easily kicked out of the SERPs by simply copying his popular post and publish it on answers site Yahoo answers or Answers.com.

    It is unfair and unethical and I am so mad about that.
    Any thoughts on handling this issue?

Comments are closed.