Data for customer-journeys analytics

Data for customer-journeys analytics
By | 2018-01-22T14:45:26+01:00 November 2nd, 2017|Categories: Customer Journey, Lean customer-journey analytics|

This is the second post in the lean customer-journey analytics series. If you don’t know what it is, read the introduction to lean customer-journey analytics here. In this post, I will go into more detail about the data you need for the customer-journey analytics.


Since customer journeys typically span multiple channels, we would ideally like to have data from all these channels:

  • Web analytics data, since everybody has a website

  • Mobile app analytics data, if you also have an app

  • Ad impressions

  • Data from your email system

  • Data from your customer-support systems: support calls, emails, chats

  • Data from your CRM – meetings, calls, emails

  • Transactional data, i.e. who eventually purchased

In practice, however, it’s advisable to start with just 2-3 most important channels. For example connecting data from your website and your CRM system and customer-support system can already give you important insights.


It is important to realise that for customer-journey analytics, you will need data on the individual level: you need a log of each event of each individual. Data, such as number of hits per page, per device, per country, etc. will not be enough. In general, your datasets should have the following form:

User or device ID Timestamp Event description or code
9873fds232 2017-11-02 10:12:03 Column 2 Value

User or Device ID is a unique ID identifying the user or the device they are using. Depending on where the data comes from, this ID will have a different form. For example, in the web analytics dataset it will be typically an alphanumeric unique ID generated by your web analytics software and stored in a cookie on the user device. This is an ID from a Google Analytics 360 dataset: 380066991751227408. In a dataset coming from your CRM system, this ID might be simply the first and last name of a customer, their email address or a customer number.

Timestamp field should contain the exact timestamp of the event.

Event description or code field should contain enough information to identify what type of event took place. In the web analytics datasets this field will contain page URLs or events fired on pages. In CRM datasets this field should at least tell us what happened: was it a phone call, an email contact or a meeting. But you can also work with more detailed event descriptions: the name of the email campaign that was sent, the type of the call, meeting, etc.

These three columns are the bare minimum for customer journey analytics. If your data contains more information, such as user characteristics, event durations, etc. it’s great, you can add it to your analysis but you don’t necessarily need it.


Web analytics data requires some special attention. Most websites nowadays use Google Analytics Standard as their web analytics solution. While GA standard is a great tool (and entirely free), it will not give you access to your raw data. This will not only make customer-journey analytics impossible, but it will also get you in trouble if you want any question answered that’s not answered by standard Google Analytics reports. You’re entirely dependent on features Google chooses to provide. If you want access to your raw, individual-level data, consider one of this options:

Paid web analytics solution. Many paid web analytics solutions offer a way to access the raw data:

Free web analytics solution. There are a couple of web analytics solutions that will give you your data for (almost) free. They will not necessarily have the features that you’re used to in Google Analytics. Note, however, that you can always install them next to your GA installation.

  • Piwik is a free and open source web analytics solution. You can install it on your own server and it will collect your data in your own SQL database. If you don’t want to bother with servers yourself, you can also have it hosted in the cloud for little money.
  • Yandex Metrica is a Russian competitor of Google Analytics with a lot of features. Unlike Google, Yandex offers a raw data export option in tsv format. Yandex is also completely free. On their pricing page they explain why: they use your (anonymized) data to improve their flagship product Yandex search engine. Make no mistake: Google also uses your data to improve their products, but they’re not as open about it.
  • Open Web Analytics (OWA) is a free, open-source and self-hosted web analytics software. It requires a server and some nerdiness to get it up and running, but some people like it better than Piwik.

Tools that store your raw data before sending it to GA. 

  • You can use Segment to collect your data and send it both to your GA and your own data warehouse.
  • Clickstreamr does a very similar thing.


Finally, we need to make sure that we can track the same person through different data sets: from web to CRM or other internal systems, for example. There are number of standard tricks to connect web sessions to ‘real’ people who’s data you store in CRM, support and other internal systems:

Let them log in. Try to convince them to log into your website. The moment they log in, your web analytics software will store their login name or email address together with their cookie identifier. From this moment you can link all their visits, also before they logged in. If you let them log in on different devices, you can also link they’re behavior from all these devices – something that web analytics software alone will not give you – they will always treat visits from different devices as different visitors.

Free content in exchange for data. If you don’t have any reason to ask people to log in, you can use the free content trick. Make sure that you have interesting articles, white papers or something else that people like to have. Before they can download the goodies, let them fill out the form and give you at least their email address. Store it together with the cookie data and you’ll be able to link the real person to their web visits.

That’s it for today folks. I hope I got you thinking about your data. If you have any questions or need advice, drop us a line. And watch out for the next post in this series.