A River Runs Through It, or, The Elusive User Scope in Google Analytics

Update: Apparently Google has caved in to popular demand as is rolling out a report that allows user-level tracking. I’m sticking to my opinion until  I can see if this newfangled thing actually exists and it has proven to be of any value. I still don’t buy it.

I spend a lot of time working the Google Analytics tag on Stackoverflow, and one question that pops up regularly is “how do you track individual user with GA”.

Actually,  you don’t. Google Analytics is not designed to track individual users, partly because Google cares a lot more about privacy than it’s given credit for, and partly because tracking individual users is not, as such, a particularly useful endeavor.  The purpose of Web Analytics is to detect patterns in user behavior, and looking at individual users does not let you do that; aggregated data does. Some of the reports (e.g. demographics) do not even work properly if your user segment is to small, and you have only 50 000 rows in your reports before any further hits are lumped together into a collective “other”-row. No to mention that GA has user flow reports, meandering like so many creeks and brooks and rivulets before they flow into one another to make a mighty stream. One of something does not flow, at most it patters.

Whenever I’m pointing that people make invariably one of the two following objections (or sometimes both).

BUT GOOGLE EVEN HAS A USER-ID FEATURE!

It has, I grant you that. But userId is actually a massive misnomer – it seems that somebody at Google thought that “cross device tracking id” would be at bit of a mouthful, even though that’s what the feature actually does. The user id is set for logged in users so you can recognize recurring visits from different devices. However by default it is not exposed anywhere in the interface, and you mustn’t use a user id that actually ids a user, as that would be a violation of Google’s terms of service.

At that point comes the second objection, which is

BUT GOOGLE SAYS I CAN STORE AN ID THAT IDENTIFIES A USER !!

It is true that Google Analytics evangelist Justin Cutroni had an article  (in 2011, no less!) that described a way (permissible within the Google TOS) to store a unique id per user (which might be identical to the user id mentioned above). There are caveats, though, the most important of which is that this must not identify a user from within the Google analytics interface. The idea is to use this data field (implemented as a custom dimension) as a key field in data imports, you can tie external data from CRM Systems and the like to data collected via Google Analytics.

However the use case for this is not to better track individual users. It is to have additional data by which you can aggregate data (e.g. by industrial vertical, age group, lifetime value etc.). You can use those aggregates to segment your visitors (as you should do, since segmentation is probably the most useful tool in Google Analytics). The only data points are exposed via the interface on a per user basis are transaction ids, and even there it makes hardly sense to look at them individually (you cannot really optimize a site for individual users).

Google Analytics can be used to track individual users the same way a gun can be used to withdraw money from a bank account – it is certainly possible, but it hasn’t been designed for that purpose and using it that way has some legal ramifications that you should at least be aware of before you start. This concerns both national laws (your mileage will vary here) and Googles own terms of service.

For example I live and work in the European Union. The EU itself does not pass any laws, it adopts directives that have to be turned into law by the member states (so the actual implementation might vary from state to state). The most famous of those regulations is perhaps the “Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002 concerning the processing of personal data and the protection of privacy in the electronic communications sector”, which is commonly called the “cookie law” despite the fact that is not a law, mentions cookies only twice by way of example and has its actual scope and purpose written right into its’ name.

To comply with the EU directives you have to clear some pretty high hurdles when it comes to storing personally identifiable data (PII). For one you need the informed consent of the person in question. The keyword is “informed”, i.e. the visitor has to know and understand what you are actually doing with the data. Also you need to be able to remove PII from your records at the request of the user, and at that point GA falls flat. A few years back there actually was a major dispute to determine if Google Analytics should be legal at all in Germany (which was settled out of court after Google made a few concessions like e.g. IP anonymization).

Probably to avoid further legal hassle Google amended its terms of service. It starts with the rather comprehensive sounding sentence

“You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google.”

Apart from those general terms there are additional terms if you want to use advanced features of GA, like demographics or advanced remarketing.

So far so good. The snag is that it is really hard to tell in advance what exactly constitues PII. E.g. webservers have collected ip addresses more or less from the day they where invented, until the EU decided that was illegitime PII, and it’s worth pointing out that the Google TOS prescribe that you get the visitors consent if you want to use the UserId Feature, despite the fact that the user id is not exposed via the interface and, as pointed out, does not actually identify a user. When in doubt was actually is allowed can only be determined by a) Google (because they make the TOS for their service) or b) a court of law, and unless you really want to duke this with either of them I recommend a conservative approach. User scope is not about collecting data on individual users, it’s about collecting aggregated data on users that does not change during recurring sessions.

The good news is, like I’ve said already twice before (and what I tell you three times is true), you do not actually need to track individual users. While even the mightiest stream is made of individual drops you cannot predict nor change it’s course by looking at the single bubbly beads (plus you probably would go nuts if you tried). Instead take your aggregates of data and build your user segments carefully, and soon they will become apparent as the beautiful tributaries of that great user flow as it runs through your website and makes its way down to the majestic ocean of conversion.

Leave a Reply

Your email address will not be published. Required fields are marked *