Over The Top SEO

OTT Blog

How to Stop Ghost Spam in Google Analytics

Greg Lucas August 17, 2016

 

It’s tough staying at the top of the SEO game. And ghost spam isn’t helping.

It seems as soon as a new technique is discovered and pushed into mainstream practice a million problems come off the back of it.

The wide use of referral spam through sources such as social buttons and adult sites mean that many sites are suffering from the added workload of dealing with the spam.

Data is fantastic. Too much data can be a nightmare.

It’s not all doom and gloom however. Today we’ll be going over the run down of what spam such as ghost spam and crawler spam is and how we can make clever use of single filters to make dealing with these pests simple and stress free.

So what do you mean by spam?

The main focus we’ll be going over today is ghost spam. The second type that’s good to keep in mind is crawler spam. Let’s do a run down of each before going into the specifics of how to handle them.

Ghost spam

Ghost spam makes up the majority of spam you’re likely to see incoming to your site or sites.

The reason for it being dubbed as such is that ghost spam is a type of spam where your site is never actually accessed. This is a key point to be mindful of as it is exactly the reason you can deal with it easily with the right formula.

A little more on the “never accessing your site” point. It’s an odd point to consider as one might think the whole point of Google Analytics is tracking site visits.

Ghost spam works through the use of the Measurement Protocol. This enables users to send data straight through to the servers that handle Google Analytics. When this technique is used in combination with tracking codes that are usually randomly generated a spammer can fool the servers into leaving a visit that never happened – complete with fake data.

Spammers using this technique won’t ever know who they are targeting. What a lovely thought.

Crawlers

We’ll go over crawlers briefly as the comparison of their function and workings is a useful reference to ghost spam.

Crawlers are different in that they actually do access the website.

As you might infer from the name, crawler spam works through hordes of spam bots working through every page of your site. They’re nasty in that they straight ignore rules set up on sites that usually stop such activity.

When a crawler bot leaves the site they will leave a report of the visit that will look very similar to a legitimate site visit by an actual human being. This makes them hard to identify and filter from your legitimate traffic.

The good news is that there are many lists online that detail the characteristics of most known crawler bots. Taking a look at any one of these and comparing to any suspicious traffic on your site can make shutting these down relatively straightforward.

What do I need to worry about?

We’ve confirmed the types of spam and how they work. What does this mean to you practically is the next subject.

The obvious thing is your data and the fact that you need to protect it.

Letting spam run rampant through your site data will pollute your analytics and reporting. Fake spam trails can throw off your understanding of your site traffic.

Spam can particularly hit small or medium sized websites hard. This is usually because such smaller sites are self managed and as such don’t have the professional services of a webmaster or an analyst. The fact that spam can create a significant portion of the site traffic can also skew reporting even more than would be the case with a larger website.

Even with a large site your reporting will still be thrown off by fake visits from ghost spam or crawlers and is always something to be dealt with.

Good news – one filter does the trick

Most people will put their referrals into their filter after finding the spam itself. This works but it’s very manual and time consuming.

It’s also limited in that spammers use direct visits and these won’t be stopped by the filter.

The smart way forward is to make a filter that just works with real host names.

This means that you’ll be automatically removing any ghost spam regardless of how it shows up. All types of spam will get picked up whether they be direct visits or referrals. It will also cover keywords and page views.

To set up the filter you just need to follow these four steps.

  1. Navigate to the reporting tab within the Google Analytics suite
  2. Go to the Audience tab
  3. Click through to the Expand Technology, then hit Networks
  4. Right at the top line of the report you simply click on host name

This gives you a full list of all host names – including the spam.

You can then make a full list of any valid host name that you come across.

The next step is to make a regular expression.

Don’t worry about adding all subdomains in. The main one you have will cover all related ones.

You’ll then want to make a custom filter with Include selected. You then simply copy your host name over to the filter field and put the expression into the filter pattern field.

Once that’s ready you simply save the filter and then apply this filter to any views that you want it to work with.

It’s wonderfully easy – this one filter will work on removing all occurrences of any ghost spam.

The one point to keep in mind is that each time your tracking codes get added to for any service you need to amend this on your filter.

What do I need to keep in mind?

The main point you should be aware of is that while your filter will kick the ass of ghost spam it’s also very sensitive.

The classic error of a single incorrect character can have dire consequences. As such make sure you play the smart way and keep regular backups of your .htaccess file at all times and particularly prior to any editing.

Some users might not feel quite comfortable enough to edit their .htaccess. A simple alternative is that you can make an exception that includes crawlers and just add this to an exclude filter using Campaign Source.

Once you’ve got these techniques worked out and functioning well you will be in the happy position of not having to worry about ghost spam and crawlers quite so much.

You can then have a little free time to actually take a look at your real data. Luxury!