Large search engines attempt to filter their search results
by removing any results that duplicate the content of other
search results. Such filtering is referred to as "duplicate
content penalty".
It is important to understand and identify what "duplicate
content" actually is. Duplicate content is generally
defined as substantive blocks of text that are copied from
one site to another. Some webmasters try to use duplicated
content in an attempt to manipulate and influence search
engine rankings. The search community still occasionally
debates the legitimacy and existence of duplicate content
filters, but whether they exist today, or will exist
tomorrow, is really irrelevant. Most webmasters have simply
accepted the fact that the duplicate content penalty is
currently enforced by at least some of the major search
engines.
With that in mind, how does the search engine determine
which version of the content is the original, and which is
duplicated? It is difficult for the search engine to tell
which website is responsible for the original version of
any content, and some innocent websites might find
themselves penalized or banned for including duplicated
content. After analyzing the behavior of search engines, it
is safe to assume that the search engines will often retain
the content listing from what it considers to be the most
'trusted' source. They may look at the number of incoming
related links, the age of the domain, or any other SEO
factors that reinforce the reputation of the domain that
contains the duplicated content. If one of the 'copies' is
considered by the search engine to be from a reputable
source, they my find themselves ranking well, while the
actual source of the 'original' version may find themselves
unjustly banned or penalized.
Representatives from the major search engines have all made
it clear that they prefer search engines that contain
unique content. Webmasters who want to avoid any current or
future bans will do well to follow these simple guidelines
in order to avoid duplicate content penalties:
1. Redirects
If you redesign your website, use permanent 301 redirects.
Redirects are a legitimate way of routing web traffic.
2. Unique
Each page within a website should be unique. The focus of
each page on a website, even if it's similar to the theme
of another page, must contain unique and original content.
3. Multi-Language
If there are multiple language versions of a website,
consider using a different domain for different versions;
search engines do not view an article translated into a
variety of foreign languages as being duplicated content --
each language version is unique content in the eyes of the
search engine.
4. Unique Meta Tags
Each web page should contain unique meta tags.
5. Robots.txt
If you do have intentional duplicate content on your
website, be sure to have a "robots.txt" file for your site
to prevent the search engines from indexing the areas with
duplicated content (or any areas of the website that you
wish to remain private, for that matter).
6. Affiliate Twist
If you are promoting products or services using an
affiliate program, use unique and distinctive product
descriptions and web copy. If you simply use the same
descriptions provided by the product owner or service
provider, it's very likely that your copy could be viewed
as duplicated content.
7. Copyright
Include a copyright notice on your website.
8. Enforce
If you discover that another website is scraping your
unique web content and replicating it, enforce your
copyright! Use CopyScape at http://www.copyscape.com/ , or
use their "copy sentry" service to receive notification of
any infractions. If you discover a copyright violation,
contact the website and politely request appropriate
changes.
If the changes are not made in a reasonable and
satisfactory amount of time, contact the ISP (web host) of
the infringing site, and file a DMCA complaint with Google
http://www.google.com/dmca.html .
9. Avoid Identical Content
Do everything you can to avoid serving a web page that
contains content identical or closely related to another
page. If for some reason you have two pages that contain
identical content, use a robots.txt to block the search
engines from spidering one version of the page.
Other Tools:
Duplicate Page Checker -
http://www.webconfs.com/similar-page-checker.php
While it may still be debatable whether all the major
search engines currently employ a duplicate content
penalty, all have made it abundantly clear that they do not
have any desire to provide search results that rehash the
same content over and over. Actively avoid any potential
penalties by taking a proactive approach to building unique
content.
----------------------------------------------------
Sharon Housley manages marketing for FeedForAll
http://www.feedforall.com software for creating, editing,
publishing RSS feeds and podcasts. In addition Sharon
manages marketing for RecordForAll
http://www.recordforall.com audio recording and editing
software.
No comments:
Post a Comment