Referrer spam is becoming a problem. If you’re not familiar with referrer spam, it’s traffic from bots that impersonate a referral link. The pseudo traffic is designed to make their domain show up in your site analytics so that you’ll visit the site.
Why is Referrer Spam a Problem? Aside from junking up your site analytics with useless data, it’s a big waste of time. We’ve heard from many of our customers here at Raven just how frustrating it is to explain what “semalt” is to their clients and why it doesn’t matter.
While it’s possible to create a filter in Google Analytics to filter out referrer spammers like semalt, all it does is mask the problem. Also, as Himanshu Sharma has written about, it may create data sampling problems. So instead of filtering out bad data after the fact, I’m going to show you how to block it at the source.
How To Stop Referrer Spam
The key to stopping referrer spam is to block it before it has a chance to register on your site as a referrer. The simplest way to do this is to add the following code to your .htaccess file.
## SITE REFERRER BANNING
RewriteCond %{HTTP_REFERER} semalt.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-website.com [NC,OR]
RewriteCond %{HTTP_REFERER} seoanalyses.com [NC]
RewriteRule .* - [F]
Deflecting
Another technique you can use is a Deflector, which redirects the traffic back to where it came from. Avi Wilensky, CEO of Promediacorp prefers this method to just blocking them. He creates a text file named deflector.map that looks like this.
#
## deflector.map
##
##referer --> redirect target
http://semalt.com http://semalt.com
http://seoanalyses.com http://seoanalysis.com
http://buttons-for-website.com http://buttons-for-website.com
Then he puts the following code in his .htaccess file.
RewriteMap deflector txt:/path/to/deflector.map
RewriteCond %{HTTP_REFERER} !=""
RewriteCond ${deflector:%{HTTP_REFERER}} =-
RewriteRule ^ %{HTTP_REFERER} [R,L]
I haven’t tried this yet, but I plan to. If you’ve had any experience with deflecting, please tell us about it in the comments below.
Blacklists
Shelli Walsh, of ShellShock UK, recommends using a blacklist of referrers and Regex coupled with commonly used spammy keywords. An example of this is available from Perishable Press.
The only problem with currently known referrer spam blacklists — at least the ones I found — is that they don’t seem to be kept up-to-date.
WordPress Plugin
For those who don’t have access to their .htaccess file or don’t feel like they have the experience to properly edit it, there’s a WordPress plugin for it. For many webmasters, semalt is the worst offender. That’s why Peadig created the Semalt Blocker for WordPress.
The Semalt Blocker plugin is currently limited to only blocking semalt, but the plugin creator, Alex Moss, has assured me that they’re working on a new version that will allow users to add more sites to block as needed.
Efficient Management of .htaccess
Another annoyance of having to block referrer spam is updating the .htaccess file for all of your sites. Fortunately, there’s a trick that Brian LaFrance of AuthorityLabs shared with me. He uses an umbrella .htaccess file for all of his sites. He does that by storing an .htaccess file in the directory that contains all of his site directories. The server will read that .htaccess file prior to each site’s individual .htaccess file, so the bots are stopped for all sites nested under that directory.
Personally, I like to use unique .htaccess files for each of my sites, but I still like for things to be as efficient as possible. My solution has been to create symbolic links to all of my .htaccess files in one folder. That way I have access to all of them, and then I can quickly open, paste and save…open, paste and save…
Here’s a spambot list that’s frequently updated.
Update – March 23, 2015
After writing and publishing this post, two new pieces of information were presented to me.
First, Rishi Lakhani would like credit for coming up with the Semalt Blocker plugin by Peadig.
@RavenTools @RavenJon aww no link to http://t.co/Hu8YEhYNK9 seeing peadig tool was my idea 😉
— Rishi Lakhani (@rishil) March 23, 2015
He also wrote an excellent post on referrer spam over at Refugeeks that you should check out.
Second, Georgi Georgiev pointed me to his post that analyzes all of the options for blocking referrer spam. He concluded that the best overall solution is to create a custom filter in Google Analytics.
You can create a filter for your sites in Google Analytics by navigating to the Admin and then clicking on All Filters. Click on the New Filter button and then create a Custom Exclude for Campaign Source. Enter the domains you want to exclude using Regex. The format should be domain. followed by a pipe (|) for each additional domain.
darodar.|semalt.|buttons-for-website|blackhatworth|ilovevitaly|prodvigator|cenokos.|ranksonic.|adcash.|simple-share-buttons.|social-buttons.
It should look similar to this screenshot:
What about you? How do you block botnets?
Update – April 20, 2015
Matthieu Napoli left a helpful link to a Referrer Spam Blacklist hosted by Piwik on GitHub. Many thanks to Matthieu for sharing that.
Update – June 16, 2015
Tom Capper at Distilled discovered another way to filter out referrer spam in Google Analytics. He suggests using a screen resolution exclusion.
Update – October 14, 2015
I’m really impressed with a service called Referrer Spam Blocker that was created by Stijlbreuk. You can add filters to many sites at once and best of all, it’s free!
Link Spy helps you find top-quality links based on those websites that are already ranking for your focus keywords.
Thanks Jon! This stuff is getting annoying. Another method I use is SetEnvIfNoCase Referer which looks like this:
SetEnvIfNoCase Referer semalt.com spammer=yes
Order allow,deny
Allow from all
Deny from env=spammer
The key to stopping referrer spam is to block it before it has a chance to register on your site as a referrer. The simplest way to do this is to add the following code to yourr
Awesome. Will try this. Thanks Linda for referring this page!
I’ve seen some people claim that blocking these sites with htaccess doesn’t always work because they’re not hitting your site just hitting the Google Analytics script with random UA Codes. Is there any truth to that?
While anything is possible, that’s the first I’ve heard of that and I have no idea if that would actually work or not.
We’ve been blocking semalt for months using the .htaccess, but the new stuff like hulfingtonpost (notice the l instead of f) along with all the other crap still comes in. I’m on the side of the random UA codes, but it doesn’t help anyone, just annoy. So we set up filters in Google Analytics and filter referrers out that way.
Correct, you can’t block a lot of the spammers with htaccess as they are not visiting your website: http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/
But what if they are actually hitting your site? from my Apache logs i can see not all but few of the referrers are spam. May be they are crawling website for our good but i haven’t asked them to crawl my site. Its best to block from GA as well as through htaccess.
Try this method – https://productsfeed.wordpress.com/2015/03/26/how-to-stop-spam-referrals-and-filter-them-from-google-analytics-using-htaccess/
We were affected by this, it is called ghost referrer spam, and .htaccess don’t block that kind of spam. The only solution so far is to create filters. Because this becomes difficult to manage manually, we have build a tool that does it automatically. It adds filter for known referral spam hosts and automatically detects and filters new referrer spam hosts by correlating referrers from different websites. You can give it a try at http://referrer-spam.help – it’s free.
Thank you for this Referrer Spam Help! This is really the bane of my online life! I’ll try that. But quite frankly this is something Google should be doing themselves !
Totally agree, it is striking that we had to come up with this solution ourselves. Google has for sure more data, people and money to solve the problem then we have
Same thing happening on our multiple websites 🙁 Tried to block these referrals from htaccess but didn’t get any success. I just found this answer on Google Analytics Forum
@Doub – yes, a filter.
No, robots.txt is used to tell good bots that visit your site to avoid certain pages or directories. These aren’t good bots, and 4webmasters.org visits are usually ghost referrals — they are injected directly into Google Analytics tracking server and have never visited your site…so they wouldn’t even know if you had a robots.txt file.
the Definitive Guide to Removing Referral Spam.
http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/
and here is Article published today on SERoundtable.com
https://www.seroundtable.com/google-analytics-referrer-spam-20394.html
Good post Jon. Seamalt was driving me crazy. I was using filters in GA. I’ll axe em in the .htaccess file now/as well.
I used to deal with this at the domain name level, bu I have found that 90% of the offending domains come from the same IP. So I have a script that blocks IP addresses that have proven to be annoying. it is work, but this keeps my useless traffic and server hits way down.
Last couple of days I’m getting spam traffic from Russia and China with no referrer, a completely new issue. Any one else seen this?
Thanks John! I am going to try the deflector today. I assume the deflector.map.txt file should also be placed at the same directory level as the .htaccess file?
Must have done something wrong… Error 500. I uploaded both as binary, but to no avail. Will try the referrer banning in the interim
Eric, I got the same error when trying this. It appears it doesn’t work with all hosting configurations (I use WPEngine). However, I was able to contact their support and they set up a “blacklist” for me.
Hey all. Thanks for sharing this. Question, do I add this after or before it says “#END WordPress”
i would add it before any of the WordPress related entries.
I found the best way to deal with semalt was to use their removal function here http://semalt.com/project_crawler.php. It has worked so far for my sites
Do not do this.
You’re just confirming that what they’re doing is working for them – in the same way as using the “Unsubscribe” links you get in email spam.
This is a group who are actively expanding a 300,000 victim botnet to continue their spam business. Your request will get you nowhere, check your stats again in a few days…
You may be right, but all I can tell you is that it has worked for all the sites I tried it with.
I agree with Es Cracker. It’s risky.
Wait.
If you’re hosting with WPEngine, neither of these methods will work. You will need to contact their support team and let them know which websites you would like to “Blacklist” – then they will set it up for you.
Others use bots, too, and for some reason googlebot doesn’t annoy anyone. Referrer spam is not what you are speaking of. Read Wikipedia: http://en.wikipedia.org/wiki/Referer_spam. And you mean bots that monitor websites. If someone placed links to your website, it would be spam. After all, it is strange that someone complains of inbound traffic.
Hi Nataliya Khachaturyan,
How many fake accounts do you have ?
Logically no search system could survive without robots. It’s not a reason to push panic, especially that present-day systems of statistics count can easily tell a human from a bot. All bot complaints are pointless. All I can say about Semalt is that they do not breach any rules. Their SEO works, and that matters most. I haven’t heard about any of their clients falling under Google filter.
If you don’t trust to Google, what is it all about?
Yes, robots are part of life. But other robots don’t spam your analytics.
What do you mean? . If Semalt was involved with shady spam tricks, they would have taken measures a long time ago:)
Hey Nataliya Khachaturyan,
How many fake accounts do you have ?
How do you implement this if you are on NGINX with no htaccess file?
Check this out for NGINX.
http://eclecticquill.com/2014/12/11/use-nginx-to-block-referrer-spam-from-semalt/
If you’ll forgive the self-link, I wrote a tutorial for blocking in NGINX: http://eclecticquill.com/use-nginx-to-block-referrer-spam
Hope this helps.
Thanks for this article, very interesting stuff.
I currently use the SetEnvIfNoCase Referer method and it has worked well in the past but recently I have found that Doadar is somehow sneaking through.
I was interested to read in this thread that they may now be going for the GA code it´s self and not actually getting anywhere near my sites. This sort of explained why my GA was showing hundreds of Russian referrer visits that did not match with the raw AWstats showing minimal Russian traffic.
I initially thought the deflector method you mentioned in your post could work for me but if the traffic never arrives I can´t see how I can deflect it back.
Does anyone have any ideas beyond .htaccess and deflecting that can be efective against this problem.
Why don´t Google themselves come up with an effective blocking method, life would be a little bit easier if they did.
.htaccess blocking gets one source of bot traffic, but there are more. Here’s an article that talks to all three types of bot traffic:
http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/
Just want to point that Bruno Walsh & Lucas Kelly are fake accounts created by Nataliya Khachaturyan, the pseudo-something of semalt… Exactly the same crap and nonsenses… Such a shame to be so stupid !
Just want to point out that Bruno Walsh & Lucas Kelly are fake accounts
created by Nataliya Khachaturyan, the pseudo-something of semalt…
Exactly the same crap and nonsenses… Such a shame to be so stupid !
good article
Hi,
The method of deflector can’t be used in .htaccess file and it’s generated an error 500!
The RewriteMap directive may not be used in sections or .htaccess files. You must declare the map in server or virtualhost context.
However, Good article
I just had the same problem. Just had to go through and delete the code out of the .htaccess file and removed the deflector file. Back to the drawing board I guess
I try The method of deflector and get success, very happy!
Note:
– RewriteMap in httpd server config file
– RewriteCond and RewriteRule in htaccess file
– Don’t forget slash at end URL in deflector.map
The htaccess approach doesn’t work for most referrer spam and will be working even less in the very near future. Here is how this spam actually happens, why htaccess doesn’t do anything and a proposed (temporary) solution: http://blog.analytics-toolkit.com/2015/guide-referrer-spam-google-analytics/
You mention using .htaccess but what about those of us stuck with web.config
No clue. I only have experience with .htaccess. I guess you’ll just have to Google the equivalent or find someone who know more about web.config commands.
Thanks Georgi. I’ll check that out.
I have a new GA code that is not yet installed on the new website. After 1 week when i am ready to install i found that it has spam referral traffic. So blocking using htt access is not applicable since the spam referral resides on Google analytic server.
Here is an updated list of referrer spam that is contributed by the community: https://github.com/piwik/referrer-spam-blacklist
Thank you Jon, just the thing!
Notwithstanding the Deflector strategy is the best approach (but work-intensive for more than one domain), the Google filter seems like the most practical solution. With that approach, I was able to set the same filter of about a dozen prolific spammers to ALL the domains I manage in Analytics – using the “Available Views” list that is NOT included in your snapshot above (but should be ;).
Jon!
Man – trying to explain what’s going on to clients and then “take away” traffic, even though it never actually existed, by filtering it out is such a pain in the ass. Creating and updating filters in GA just seems like a futile war of attrition as well. Has anybody asked GA (or any other analytics platform) about whether they monitor for this stuff and why not issue filters or just offer the option of filtering this spammy traffic out from the GA installation in the first place.
Agreed. I’m sure someone has spoken to them. My guess is that they’re avoiding it as long as they can. Fighting spam for analytics is a bit different than it is for their search team. For example, who gets on that list and how does one get off of it. It would be one more distraction that they would have to staff up for, and there may be legal implications for them too.
Or even just a checkbox next to the referrers “filter out traffic from this source?” with the obligatory “you can always put it back in – just go to…”. They’re smart enough I suppose they can probably figure it out. This is one of those things with free products like this where you don’t really offer support and its all DIY so people use it and think it’s accurate when it just isn’t.
How’s Nashville? We missed you at SearchFest this year – I’m sure I’ll run into you at another show.
Thanks Jon but I notice less traffic from every source, even from organic traffic
(Google, Yahoo & Bing) after applying the hostname filter. I saw
this after comparing both both analytics view (with filter & without
filters).
I can understand direct or referral but why we see less sessions from organic traffic?
Should it be domain./ or domain. wanted to double-check because paragraph states to use ./ however the screenshot shows a different format
domain.
Thanks. I’ll fixor!
I did away with GA in the end and started using other tools, so far Piwik has prooven to be very good.
Cool tool here saves a shton of time importing 70+ domains – – http://www.simoahava.com/spamfilter/ – His API was never available though, so we used his Github project to mirror on our server too – https://www.searchcommander.com/seo-tools/ga-spam-referrer-filter-import/
Has anyone had any success putting deflectors in their .htaccess files? This is a wonderful idea.
I recently saw some of these domains junkin’ up my analytics. Thanks for the recommendation.
Can Wordfence plugin work the same as other spam referrer blocking tools? Thanks
This is all really useful and I’ve put a couple of filters in place to block this spam, but there seems to be a new one pop up as fast as you can block them. Perseverance I suppose until Google take action to stop their ability to spam analytics.
Also it seems that the filters I have in place are stopping my legitimate referral traffic, most of which is referred from LinkedIn and articles. Any ideas why?
The two filters I have in place are filtered on ‘Campaign source’ and assigned to the ‘All Web Site Data’ view. I have an unfiltered view in place also, which shows everything – spam and legitimate referrals.
The two filters contain these regex:
darodar.|semalt.|buttons-for-website.com|best-seo-solution.com|buttons-for-your-website.com|duckduckgo.com|best-seo-offer.com|
cmswip01.nottingham.ac.uk|zoominfo.com|mycustomer.com|4webmasters.org|best-seo-offer.com|free-social-buttons.com|
Can anyone suggest help? Thanks
I’d like to just block at my server level, but I think I need to do this with IP addresses. Is there a comprehensive list of IP addresses associated with these referral spammers?
could you use your hosts.deny file to block these from the entire server? I don’t want to have to update these filters, .htaccess files each time a new one pops up it is a nightmare. I am not sure what syntax to use for the hosts.deny.
You could use hosts.deny but keep in mind that what you have to add there is the IP address from which the requests are coming from, not the spamming domain. In other words, adding 4webmasters.org to your host file won’t do any good unless the traffic is coming from the same IP that hosts the 4webmasters.org site.
If you want to add everything on a single file I suggest you add the referrals to one server configuration file instead of the .htaccess files. That way there is only one file to maintain.
I didn’t see anyone address getting rid of the historical data from the spam referrals. We were able to get rid of them through a New Segment, but everytime we logout, the New Segment doesn’t stay selected. Anyone know what we’re doing wrong, how to set the New Segment as the default segment or some other way to permanently not show the referral spam in historical data?
I use Spyder Spanker to block and deflect bots. It works well. (wordpress plugin)
That’s an amazing name for a plugin 🙂
if someone is interesed, made gist on Github with settings for NGINX and Apache: https://gist.github.com/rolandinsh/52197d4d4feb37dffe9b
Nice Article 🙂
There is something I would like to contribute. Blocking this referrals on the .htaccess file is a waste of processor and I/O time because Apache has to read and parse the entire list of entries each time it processes a request for any of the sites on the host.
I would recommend adding it to the server configuration file, for example in Ubuntu there is a file called security.conf which is intended to centralize all the security policies and directives. There I added a line like this for each referral:
SetEnvIf Referer 4webmasters.org referral_spam
Then add this to the VHost configuration file (or the .htaccess if you want, a single line is not as bad as 180 of them)
Require not env referral_spam
This also makes things easier to maintain, you can add or remove hosts in a single place.
Thank you Jon, this helps a lot. However, after implementing the first htaccess code, I still see some of them pop up in my analytics, which is weird since they are indeed blocked via htaccess. One of the culprits is free-social-buttons, and it seems they are using different servers as such: www1, www2, www3+ … im researching a way to block these via htaccess as well. Overall, the first code has blocked about 80% of the referral spam that was hitting my sites. 🙂
Blocking various referrals which follow a pattern, ex www1, www2, www3, etc. Is easy since the RewriteCond directive accepts regular expressions, you could easily block all their sub domains using a regular expression like .*whatever.com
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritecond
Thank you for this article. When I do the filter through Google Analytics do I apply filter view to ‘all web site data’?
You could categorize duckduckgo.com as referrer spam too.
I’ve got over 400 rules and today I’m running new reports and it has been hit by a whole new myriad of bots. This is maddening. I can’t keep up with it and reporting becomes just a joke. Maybe Raven could allow us to ignore certain lines in a report, at least then I could condense it down but my first 2 screens of analytics is just garbage and there is no easy way around it. #frustrated
Hi guys,
I’m currently using this in my htaccess, however, some still seems to get through, particularly “floating-share-buttons” who managed to rack up 900 sessions this month ! Is there anything wrong with the script below?
Thanks for the article !
# Block Russian Referrer Spam
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly..ru/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly.org/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*ilovevitaly.info/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*iloveitaly.ru/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*econom.co/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*savetubevideo.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*kambasoft.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*buttons-for-website.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*floating-share-buttons.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*semalt.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*4webmasters.org/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*trafficmonetizer.org/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*webmonetizer.net/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*darodar.com/ [NC]
RewriteRule ^(.*)$ – [F,L]
Can a linux/unix guy confirm whether the recommended regex script (below) processes each packet, even those packets that are part of an established TCP session, or just the beginning of the session ( somewhere around the TCP handshake ). I am trying to judge what performance impact this script ( with three rules ) will have on the Apache server and whether it is lighter on the server to let it respond, since referral spam seems to be just one hit to the homepage.
## SITE REFERRER BANNING
RewriteCond %{HTTP_REFERER} semalt.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons-for-website.com [NC,OR]
RewriteCond %{HTTP_REFERER} seoanalyses.com [NC]
RewriteRule .* – [F]
Thanks to this article, I found the piwik-list. I already had a tool running on my web-servers, but only with a small list of my own. I adjusted my tools to automatically generate an combined blacklist file, and configure apache accordingly. The lists are updated each night. At the moment, it’s only my own short list and the piwik one. If you have knowledge of other plaintext lists, please let me know and I’ll include them. If someone wants to contribute, please do. All code is here: https://github.com/bitprocessor/referralspam-block ; Thanks again for this post.
Email blacklist are the easiest way to reduce spam messages. Your mails will not get delivered if the server has been blacklisted. Seo blacklist check will check over 100 DNS based blacklists on a server IP address.
Hey Jon, thanks for sharing that Referrer Spam Block site. Going to test it out now 🙂
Anyone seen spam like this?
I’ve been using the filters in google analytics for a while now. I’m on windows hosting and have blocked it through my web.config file but that became a hassle and I also didn’t want to make that file huge, and since I get hit about once a week or so by a new one the analytics filter has been my best option so far. Constantly doing it is a pain but it’s easy to do.
Thanks! Great article. I seem to have been hit with this, but all my referal urls are in fact my own domain – it seems like when I initially got hit, the spammer stored my domain externally and spoofs the request to look like its coming from my own servers – obviously I can’t block requests from my own site – any ideas how to get around this?
How can we filter it in Raven while generating Analytic reports?
Awesome, But according to Moz “it is the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.” They also said “the .htaccess file can only effectively block crawlers such as buttons-for-website,com and a few others since these access your site. Most of the spam can’t be blocked using this method, so there is no other option than using filters to exclude them.
This article was written in early 2015 and at the end of the post is an update that links to a new post about stopping referrer spam: https://raventools.com/blog/remove-google-analytics-referral-spam/ — I may push the update message and link to the top so it’s less confusing.
Tanks Jon, I got it
I can not stop spamming in my blog. Thank you so much for this post. It will help me a lot.
That’s really awsome but my question is, If We will filter it through Google analytics. It will work and is it any way to do it through Raventools. Thanks
It should. The filters you set up in the Admin affect the results in GA that you see for that property. In turn, Raven uses their API to get results for the property, which should be the filtered results from GA.
Great Article Jon!! I was getting through this problem but this article is very informative and I get the solution of my problem.
Nice article. thanks
Thanks for sharing such a nice article really helpful