When the whole Web Analytics thing started there was not much experience to draw from on how to model behavior of online audiences. It was assumed that they behaved somewhat similar to real world customers, and that we had a good idea how we could analyse them. This usually looked somewhat like this:
There are certain key assumptions here, the most problematic probably being that users show rational behavior by following several discrete steps, where each preceding step motivates the following one, until users have reached conversion. This might still be true for some high value items. For the most part rational behavior is no longer feasible.
For example: When I go motorbiking into the alps I carry a cheap mobile, because if I drop a phone into a crevasse it better be a 100 Euro Android device than a 800 Euro iPhone. When I went to Amazon for a replacement and entered my search terms and price range it returned a list of 1892 phones. There is no way I could make a rational decision between 1892 phone that either have almost the same name and wildly different specs, or almost the same specs with completely different names. In the end I bought a red one.
There is a user story here, but it’s not the story of a rational user who is trying to buy at the margins. Instead it is a story of a middle aged man who increasingly struggles with the impositions modern life forces unto him. Your mileage may vary. But all in all users do or don’t do things for any variety of reasons, and often they do things simply because it seemed a good idea at the time.
When Google launched Universal Analytics in 2013 they suggested a different way of doing things, which was referred to as the Google Analytics ABC – short for “acquisiton, behaviour, conversion”. This followed a rather different set of assumptions – Reality is the sum of everything that is observed, there are no a priori stories, and while there are many different user stories hidden in the data you have to find them yourself.
Philosophically this was a much more satisfying approach, since it did not claim to know more about the user than was actually recorded in the data. It did however produce any immediately actionable results so it did not quite catch on. However ideas like this that put the focus on the actual data instead of any preconceived idea of how users behaved paved the way for the next big thing in Analytics, the idea of attribution.
- In social psychology, attribution is the process by which individuals explain the causes of behavior and events
- In marketing, Attribution is the process of identifying a set of user actions that contribute in some manner to a desired outcome, and then assigning a value to each of these events(Wikipedia)
The idea of Attribution originates in psychology. In 1958 the Austrian psychologist Fritz Heider published his seminal work “Psychology of Interpersonal Relations”which started the field of Attribution theories, which basically deal with the question “What do people believe about why things happen”. If we know to what reason a person attributes it current situation we can, so it was hoped, better understand why they do behave the way they do.
When marketing adopted the idea the focus shifted away from the individual; instead attribution theory became the quest for a stimulus that could be applied to number of people at once in order to increase the rate of the desired outcome. By applying a value to each of those stimuli, one could shift marketing budgets to increase the chance of conversions. Modern attribution theories remove the user even more from consideration in that they attribute outcomes no longer to user actions, but to the influence of marketing channels.
This is a First Order Markov graph, an example from the documention for the R channel attribution package. “A” and “B” are marketing channels, conversion is the desired outcome. The arrows indicate the probability that somebody will move from one state to the next. As you can see everybody has to go through channel “A” to arrive at a conversion, so this receives full attribution. Only some 30% have to go through channel “B”, so “B” is somewhat less important.
This kind of probabilistic attribution represents major progress. There is no rhyme nor reason, no motive, nothing outside the data as it is stored in your system. From here you can conclude that A performs really well and you should invest more money into that channel, or that B is undervalued and you should shift some of your budget there, or any number of other interesting things. So this might still need work, but the idea is sound.
As per usual there are caveats. First, doing attribution costs money (either in wages or in tools). Attribution helps with budget allocation, i.e. you can redistribute your advertising budget to more promising channels. You have to make sure, though, that a mere shift in budget will make you more money than you have just spent on attribution. It might be (as I have said before), especially on a smaller budget, that there are simpler and more cost-effective measures that you should start with.
Despite our best efforts there is a certain fudge factor built in
Always test multiple conflicting hypotheses to arrive at valid conclusions
Really cool (but you do not necessarily want to lead with that)
(to be concluded)