The research firm Incapsula has estimated that in 2013, just over 60% of all traffic on the internet was from bots. The good news is that half of that traffic comes from “good bots”, mainly search engines. The bad news is that the other half doesn’t. A solid third of internet traffic falls into a gray zone that runs from the merely annoying to the purely malicious.
Website developers have an unusual window on this when they look at the trackbacks of people (or, more likely, bots) who visit their page. Any website quickly attracts a series of spurious incoming links, like barnacles growing on a ship’s hull. Initially, these may seem harmless, and one might even take the attitude that any traffic is good traffic. But they have some real downsides.
The drawback that often becomes apparent first is that there is often enough traffic coming in from spurious links to max out simple traffic analysis software. It’s nice to know which URLs your last 50 visitors came from, but if 48 of those URLs are German porn sites, and you’re selling car insurance in Ohio, you are no longer getting useful data. The fix here is to upgrade your analytics package, but this is a red flag for a problem that’s harder to see.
Google’s PageRank algorithms take into account where your incoming traffic is from. Generally, traffic is a good thing: the more traffic, the higher your listing in Google searches. But some traffic is counted as having a negative value. If you are buried in a network of links from junk websites, you start to look like a junk website yourself. And no one has time for that. So to get the PageRank your site deserves, you want to let Google know that “you’re not with those guys”. The Disavow tool is the way to do that.
Google has generally taken the attitude that everything about its PageRank algorithm is a feature, not a bug, and that they can already detect which of the websites linking to yours are junk. They also point out that many other factors are used in calculating your overall PageRank. In other words, they’ve created the Disavow tool, but at the same time they’re suggesting that it isn’t necessary for you to use it. That might or might not be true; if you’re here reading this, you probably are in the large club of web developers who think it is a good idea. And after all, you have very little to lose, and potentially something to gain, by disavowing traffic from junk sites.
The issue to be concerned with here is that you might cut too deep. By using Disavow, you are telling Google not to count certain traffic sources to your site in their algorithms, whatever those might be. Since their codes are closely guarded secrets, a developer can never know for sure if any particular site’s incoming links are helping or hurting their PageRank. So it’s worth being very careful in the choice of the sites you want Google to ignore. For instance, one developer we spoke to had used Disavow to ask Google to block Yandex.ru, which they erroneously assumed was a junk URL, probably because it is all in cyrillic. In fact, Yandex is the fourth large search engine on earth. There are plenty of other stories like that one: some of the traffic to any site will come from unusual but legitimate sources.
It’s also worth taking a moment to note that the Disavow tool _does not_ perform some features you might have in mind when dealing with spam. It does not block traffic from junk sites, for instance. Nor does it affect the ability of bots and scrapers to access your site for their own purposes. All it does is tell Google to ignore the traffic from those sites when calculating your PageRank. Finally, if you yourself have set up spam links back to your site (or hired a third party to do so), then it makes more sense to delete those links than to disavow them.
The Disavow tool is located within the larger panoply of Google’s Webmaster Tools. If you are not yet using those tools, or you aren’t familiar with them, you’ll need to start doing so before you can use the Disavow tool effectively. That is a very easy setup process, but it’s outside the scope of this discussion. Once you’re within Google’s Wembaster Tools, you need to begin by creating a master list of the major inbound links that your unwanted site traffic is coming from. Although there are many other possible ways to get a list of inbound links to your site, for the purposes of theses instructions we’re going to assume you are using Google’s Webmaster Tools to compile that master list.
If you are not familiar with the layout of these tools, you want to follow the links in this order: “Search Traffic”, “Links to Your Site”, “Who Links The Most”, “More”. From that screen, you have a range of options to help create a master list of inbound links to your site. You want to decide which of these are relevant to you, and copy it into a text file. This file will contain all the major and/or recent inbound links to your site.
Now it is worth noting that you are now in a live-wire situation. If you were to accidentally upload the list you’re looking at to Google’s Disavow tool, you are asking them to ignore _virtually all_ the current traffic to your website. We’re guessing that that’s not part of your marketing strategy. In fact, the risk of ignoring good traffic is serious enough that you probably want to create a separate blacklist file, and copy and paste specific URLs over into it. It is slightly more convenient–and thus tempting–to just go through your master list and delete the URLs you don’t want to blacklist. But that method allows much more room for error.
What you want to end up with is a .txt file that is delineated by line breaks, and encoded either using UTF-8 or the simpler, 7-bit version of ASCII. You can consolidate multiple URLs from the same domain by using the format “domain:spamsite.com”, and with some obvious exceptions, this is usually a good practice.
Now go to www.google.com/webmasters/tools/disavow-links-main. From there, after choosing the site if needed, you can upload your list of URLs to be ignored. When you upload your list of URLs to disavow, it overwrites any previous list you’ve uploaded. You aren’t adding to the list, you’re replacing it with a new list. This means you should create and save a master copy that you can edit as needed.
Don’t expect instant results. Google’s PageRank algorithm is complex and esoteric, and ultimately your page ranking is also going to be affected by various other factors: changes on other websites, traffic flows, the weather in Scandinavia, and who knows what else. All you can do here is stack the deck a little in your favor.