Nowadays, conversion is usually preceded not just by one but several interactions with a website or an app.
Attribution determines the role of each touchpoint in driving conversions and assigns credit for sales to interactions in conversion paths.
As Google’s deprecation of Universal Analytics (UA) nears, it’s crucial to understand attribution in Google Analytics 4 (GA4) – including what is new, what is missing, and what the differences mean for search marketers.
(If you are new to attribution, read the Google Analytics help article on attribution first.)
How Google Analytics 4 attribution works
Universal Analytics reports attributed the entire credit for the conversion to the last click. A direct visit is not considered a click, but for the avoidance of doubt, this attribution model was also called the last non-direct click model. Other attribution models were only available in the Model Comparison Tool in the Multi-Channel Funnels (MCF) reports section.
GA4 offers a wider availability of different attribution models, but it depends on the scope of the report – whether it is the user acquisition source, session source or event source.
In Universal Analytics, the source dimensions had session scope solely. The MCF reports made it possible to analyze the sources of all sessions on the conversion path. The three scopes of source dimension in GA4 (user, session, event) are the most important and fundamental changes in the attribution area.
This guide will use the term “source” in a broader meaning as any dimension that indicates the origin of a visit, e.g., channel grouping, source, medium, ad content, campaign, ad group, keyword, search term, etc.
Session source
Session-scope attribution – unsurprisingly – determines the source of the session. It is used, among others, in the Traffic acquisition reports in the Reports section. It works similarly to Universal Analytics in always using the last non-direct click model.
The session source is the source that started the session (e.g., social media referral or organic search result). However, if a direct visit started a session, the session source will be attributed to the source of the previous session (if there was any).
Quick reminder: A direct visit means that Analytics does not know where the user came from because the click does not pass the referrer, gclid, or UTM parameter.
Therefore, exactly as it was in Universal Analytics, the session source will be direct only if Analytics cannot see any other source of visit for the given user within the lookback window. The default lookback window in GA4 is 90 days, while in Universal Analytics, it was six months by default. We will return to the lookback window matter later in this article.
By the way, what is a session?
A Google Analytics session is not the same as a browser session.
In GA4, a session begins when a user visits the website or app and ends after the user’s inactivity for a specified time (30 minutes by default – see this Analytics help article).
Closing the browser window does not end the session. If the browser window is closed, another visit to the website within the time limit would still belong to the same session – unless the browser deletes cookies and browser data after closing the browser window, for example in incognito mode.
In Universal Analytics, when a user re-visits the website from a new source during an existing session, the existing session is terminated, and a new session starts with that new source.
In GA4, it is no longer the case. If a visit from a new source occurs during a session, a new session will not start, and the source of the current session will remain unchanged.
It does not mean that the visit from the new source is ignored. GA4 records the source of this visit, and the event-scope attribution reports (more on that later in this article) will take into account all sources of all sessions. (See this Analytics help article.)
A new visit during an existing session may happen, for example, if a user returns from a payment gateway or a webmail site after password recovery or registration confirmation. In GA4, these visits will not artificially inflate the number of sessions, as in Universal Analytics.
Nevertheless, sources of these visits are so-called unwanted referrals and should be excluded. Visits from excluded referrals are reported as direct visits.
In GA4, these visits are de facto ignored because the session source and the session count remain unchanged. The non-direct attribution modeling in GA4 will assign no credit to this (direct) source (as described later in this article).
In Universal Analytics, the session (regardless of duration) ends at midnight, which is no longer the case in GA4.
First user source
First user source (source of the first visit) is new to GA4. It shows where the user came from to the website or app for the first time.
It is a part of Google’s new approach to measurement in online marketing, which no longer focuses only on the classic ROAS (revenues vs. costs), but also analyzes the CAC vs. LTV (customer acquisition cost vs. lifetime value).
This approach reflects the app logic: we have to acquire the app user first, and after the app is installed, further marketing efforts engage and monetize the user. However, for the web traffic, it also makes more sense.
The new customer acquisition goal in Google Ads, available in Performance Max campaigns, also represents a similar approach. In this case, the focus is on the first-time buyer, not the first visit.
In GA4, the first user visit is recorded by the first_visit event for the website or the first_open event for the app. The naming is self-explanatory.
Therefore, the source of the first visit is a user attribute and indicates where this user’s first visit to the website or application came from.
The first visit source is attributed using the last non-direct click model. Of course, this attribution applies only to interactions before the first website visit or the first open of the app (interactions following the first visit or first open are not taken into account).
Once assigned, the source of the first visit remains unchanged – of course, as long as Google Analytics can technically link the user’s activity on the website and in the app with the same user.
The first user source will be reset if the tracking of the user is lost, for example, if the user does not visit the website for a period longer than the Analytics cookie expiration date.
We will return to the Analytics cookie expiration period and other data collection limitations in GA4 later in this article.
Event scope attribution
In GA4, events replaced sessions as the fundament of data collection and reporting. GA4 makes it possible to report attribution using a selected attribution model for any event (not only for conversions).
The model is set in the Attribution Settings of the GA4 property. There are several pre-defined models to choose from (see the screen below).
The default data-driven model can be changed at any time. This change is retroactive (i.e., it will also change the historical data).
A common belief is that Google Analytics 4 no longer uses the last-click attribution model. But is that the case?
In practice, it applies only to customized reports that use event-scope dimensions and metrics, for example, Medium – Conversions.
The default traffic and user acquisition reports use session source and first user source, respectively, and these dimensions use the last click model. It is indicated in the dimension name (e.g., Session – Campaign or First User – Medium).
Remember: source, session source and first user source are three different dimensions where different attribution models apply.
Scope | Attribution Model | Where available |
Session | Last click | E.g., traffic acquisition reports |
User (first user source) | Last click | E.g., user acquisition report |
Event | Model set in the GA4 property settings (data-driven by default) | E.g., in the Explore section |
Attribution settings
The attribution model set in the property settings applies to all reports in the property.
There are several attribution models, known from Universal Analytics (described in the earlier mentioned Analytics help article), to choose from. However:
- All the models do not assign value to direct visits unless there is no other choice because there is no other interaction on the path. In other words, they all use the non-direct principle, which was not the case in the Universal Analytics pre-defined attribution models, except for the last non-direct click model.
- The Ads-preferred models assign the entire conversion value to Google Ads interactions if they occur in the funnel. At the moment, there is only one Ads-preferred model available – the last click model, which is the equivalent of the “last Google Ads click” known from Universal Analytics. In the absence of Google Ads interactions on the funnel, this model works like a regular last-click model.
- In addition to clicks, models take into account “engaged views” of YouTube ads, that is, watching the ad for 30 seconds (or until the end if the ad is shorter) and other clicks associated with that ad (see this Google Analytics help article for more details).
Again, a change of the attribution model settings works retroactively (i.e., it applies to the historical data before the change). Saved explorations will be recalculated when viewing them.
Lookback window
Google Analytics property settings determine the length of the lookback window. The lookback window determines how far back in time a touchpoint is eligible for attribution credit. The default lookback window is 90 days, but you can change it to 60 or 30 days.
According to Analytics documentation, the lookback window settings apply to all attribution models and all conversion types in Google Analytics 4 (i.e., it also applies to session-level attribution and attribution model comparisons).
The lookback window of the first user source has a separate setting (30 days by default, and it can be changed to 7 days). Are you wondering why it is defined differently?
Well, first of all, it is worth considering why there is any lookback window for the first visit at all.
Moreover, why are we talking about the first user attribution model, which is always the last (non-direct) click?
After all, GA4 knows the source of the first visit when this visit happens. As it is the first visit, there are no previous visits, and thus no other sources to consider.
So, what is the point of looking deeper in time than the first interaction with a website or app?
The answer is Google Signals. If this option is enabled for the GA4 property in the Data Collection settings, GA4 will enrich the data collected by the tracking code with, among others, information known by Google about logged-in users.
For example, Google may know that the user had an engaged interaction with our YouTube ad on a different device before the first visit.
Similarly, the user may use the app for the first time (first_open) during a direct session, but the install itself may result from a mobile app install campaign in Google Ads, clicked a few days earlier.
Therefore, if the source of the first visit session is unknown (it is a direct visit), Google Analytics may try to assign the source of the first visit to the earlier known interaction if it occurred during the lookback window period.
In other words, thanks to Google Signals, GA4 may record ad interactions before the first user visit.
Lookback window changes do not work retroactively. It means that they only apply from the moment of the change.
The engaged views of YouTube ads, however, always have three days lookback window, regardless of the property settings.
Bye to cookie logic?
It is a nuance but worth noting. Universal Analytics's default lookback window for the acquisition reports was six months, and any change to this period was also non-retroactive.
Such a change, however, did not apply to conversions but to interactions that had taken place after the change. It reflected the logic of the _utmz cookie, which was responsible for storing the source information.
Its expiration time was set when the cookie was created or updated (i.e., upon a visit from a given source). Universal Analytics no longer uses the _utmz cookie (it was used in earlier versions), but the logic was maintained for data consistency.
For example, changing the lookback window in Universal Analytics from 30 to 90 days did not immediately include interactions from 90 days ago in the acquisition reports for the visits since the date of the change because the virtual "source cookie" for interactions older than 30 days has already "expired."
There was a transition period (in this example, 90 days), after which all conversions were fully reported under the new lookback window.
Google Analytics 4 uses a different data model, with no continuity with the UA data. They could therefore break with this past and stop using the cookie logic.
For example, they could apply changes to all conversions that have taken place since the change, as it is now in Google Ads. Interpreting such would be much easier. They could, but they did not.
In GA4, the change applies to interactions still in the lookback window.
For example, if the lookback window is increased from 30 to 90 days, the conversions will not immediately be reported in the new, 90 days lookback window. It will be reflected in the reports after 60 days from the date of change (the interactions from the initial 30-day lookback window will be remembered).
Reducing the lookback window (e.g., from 90 to 30 days) will apply the change immediately (i.e., all conversions will be reported in the shorter, 30 days window).
Yes, it sounds exotic. Fortunately, in practice, the analysts do not change the lookback window often.
Cookie expiration and data retention
The Google Analytics 4 cookie has a standard expiration time of 24 months, but it can be changed to a period between one hour and 25 months (or the cookie may be set as a session cookie and expire after the browser session end).
Subsequent visits may renew this time limit. This will be the period in which Analytics will be able to recognize a returning user and remember the source of the first visit – see this GA4 help article).
However, it does not automatically mean that GA4 will "remember" user data that long.
In addition to the cookie expiration, we also have to deal with the GA4 data retention period. It is set by default to only two months, but you can (and basically, you should) change this setting to 14 months. (In the paid version, Google Analytics 360, it can be up to 50 months.)
After this time, Google deletes user-level data from Analytics servers. To keep this data, you must export it to BigQuery (see this GA4 help article).
It means that reports in the Explore section can only be made within the data retention period (please note that in the Explore section, you cannot select a date range beyond this period).
These restrictions do not apply to standard reports in the Reports section that use aggregated data. GA4 will store this data "forever."
In the unpaid version of GA4, the first user source data are deleted after 14 months of inactivity. After that, this user will be recorded as a new user.
Therefore, there is no point in, for example, changing the cookie expiration time from default 24 months to a longer period, unless you use Google Analytics 360.
Conversion export to Google Ads
Exporting conversions to Google Ads is often used as an alternative to the native Google Ads conversion tracking, as the fastest and most convenient way to implement conversion tracking in Google Ads.
However, this time-saving seems illusory in the era of Google Tag Manager. Moreover, this solution has many disadvantages.
There are several arguments against using imported conversions from Google Analytics to optimize Google Ads. It:
- Reduces the number of conversions observed in Google Ads.
- Uses exotic attribution.
- Is vulnerable to unforeseen Google Analytics configuration and link tagging errors, such as unwanted referrals or redundant UTM parameters.
Therefore, while importing conversions from Analytics may provide interesting data that cannot be collected in Google Ads, using them as goals for optimizing Google Ads campaigns may not be optimal.
If you import conversions from GA4 to Google Ads, regardless of the GA4 attribution settings, the conversions will be imported using the GA4 last non-direct click model.
This means you will only import conversions whose Google Ads source has not been overwritten by subsequent clicks (e.g., organic search results or social media ads).
Model comparison tool
Regardless of the property-level attribution settings, Google Analytics allows comparisons of different attribution models in the Advertising section.
Currently, the available models are the same as those available in the property settings, and it is impossible to create custom models.
Interestingly, GA4 allows reporting in two conversion attribution time methods – interaction time and conversion time (only the latter option was available in Universal Analytics).
The interaction time method is typical for advertising systems, where conversions are attributed to clicks and, thus – costs. It allows a correct match between costs and revenues.
Otherwise, the reports might include conversions after the end of the campaign, in a period when there is no ad spend.
On the other hand, the interaction time method may cause the total number of conversions to change depending on the attribution model, as different models may attribute conversions or their fractions to clicks outside the reporting period.
Moreover, the conversion count and revenue for a given reporting period may grow over time until the lookback window closes.
In other words, we may observe more conversions for the recent period if we look at the same report in the future – which is not the case when conversions are reported in the conversion time.
Both approaches have advantages and disadvantages, so it is good that we can now use both.
Conversion paths report
Compared to Universal Analytics, the GA4 conversion paths report is enriched with additional data: time to conversion and the number of interactions for a given path.
It partly compensates for the lack of time lag and path length reports, which were separate reports in Universal Analytics.
The ability to choose an attribution model for this report may be surprising at first sight.
The attribution model does not affect conversion paths. They remain the same, and their length and time to conversion do not change.
In GA4, the path visualization also includes the fraction of conversion assigned to a given interaction or their series in the selected attribution model.
In the last click model, the last interaction always has a 100% share in the conversion, but in the other models, the distribution will be different.
This feature also allows a better understanding of how the data-driven model worked for the interactions in this report.
Additional bar graphs are placed above the funnel report, visualizing how the selected attribution model assigned a value to channels at the beginning, middle and end of the funnel.
The early touchpoints are the first 25% of the interactions along the path, while the late touchpoints include the last 25%. The middle touchpoints are the remaining 50% of the interactions.
If you feel that the distribution between early, middle, and late touchpoints does not look as expected for the multi-touch models, please note that if there are only two interactions, there is one early, one late, and no middle interactions.
If there is only one interaction, for the multi-touch models, it will be reported as late interaction – which distorts these reports the most.
Probably, it would be better if the only interaction was considered as 33.3% early, 33.3% middle, and 33.3% late interaction.
Thus, the attribution model will only affect the bar charts at the top of the report and the percentages shown in the funnel visualization.
The table figures (funnel interactions, conversions, revenue, funnel length, and time to conversion) will remain the same, regardless of the attribution model.
By default, the conversion paths and model comparison reports include all conversions in the GA4 property. Therefore, it is worth remembering to select the desired conversion first.
Use of scopes in the reports
Again, the source dimensions in GA4 can have one of three scopes: session, user, and event.
- In the case of the event scope, the attribution model specified in the property attribution settings is used.
- The session source (session scope) is assigned to the last non-direct interaction at the session start and remains unchanged for a given session, even if there is a visit from another source during the session. It's the "first source" of the session, although assigned in the last-click model.
- Similarly, the first user source (user scope) is assigned to the last non-direct interaction before the first visit and remains unchanged.
In Google Analytics, all dimensions and metrics operate within their own scope. For example, the Landing page dimension has the session scope, and the Page dimension has the event scope.
Although technically possible, using dimensions and metrics of different scopes can sometimes lead to confusing or difficult-to-interpret reports.
For example, the Page dimension should be matched with Page views, not Sessions. If we combine Pages with Sessions, Universal Analytics will show the number of sessions similar to Landing page vs. Sessions report.
In GA4, this will be the number of sessions during which a given Page has been visited, and therefore, the sum of sessions for all Pages will be greater than the total number of Sessions.
But if you think about it, there is little point in making such reports – therefore, the uncertain interpretation of these numbers should not worry us too much.
However, some reports using dimensions and metrics of different scopes will make sense. For example, for source dimensions in GA4:
- The number of events (event scope) paired with the First user source dimension (user scope) shows how many events were generated by users whose first visit was from a given source.
- The number of events (event scope) paired with the session source dimension (session scope) shows how many events were generated by users during sessions with a given source.
The GA4 documentation fails to indicate how to interpret the number of sessions or users matched with the event scope. Such explorations, although possible, often contain many not set values.
However, creating such reports doesn't make sense. (See the previously mentioned GA4 help article on scopes.)
Modeled data
Finally, it is worth emphasizing the fundamental change in Google Analytics 4, where reports include data collected by the tracking code enriched with modeled data.
The modeled data uses information collected in the cookieless consent mode for users who have not given consent to tracking and Google Signals data for users logged in to Google. This data is fragmentary, but Google can fill in the missing data using extrapolations and mathematical modeling.
Thanks to Google Signals, in GA4, we can see an approximate but more complete picture of the user's journey.
For example, Universal Analytics recorded an iPhone user who visited the website from a YouTube ad using Safari and never returned.
Universal Analytics also saw a conversion made by another user who came from a direct visit on the Chrome browser for Windows.
Google knows these events belong to the same user because this user was logged into Gmail and YouTube.
This is how Google Analytics 4, using Signals, can model the cross-device users' behavior. It makes the reported number of users more real (reduces it) and improves the attribution accuracy.
In the example above, the conversion from the direct session can be correctly attributed to the YouTube ad.
Not all users are always logged into Google – many do not even have a Google account.
Therefore, to make the picture more complete, Google Analytics will assume that users who are not logged in behave similarly.
Consequently, GA4 sometimes will supplement the missing sources (e.g., assign certain sources to conversions that were previously assigned to direct).
The behavior of users who have not given consent to tracking is estimated similarly.
Analytics knows the number of page views and conversions from the non-consented users and can model how many users generated these pageviews and conservatively attribute conversions to sources.
Enriching Analytics data with Google Signals may take up to a week. Therefore, the recent data may change in the future.
Please note that we also dealt with delays in Universal Analytics, where most reports could have delays of up to 48 hours.
Various privacy-oriented technology solutions, such as PCM by Apple or similar solutions proposed by Google (the Privacy Sandbox), randomly delay conversion reporting by 24-48 hours.
Therefore, we must get used to the fact that the full view of analytical data will only be available after some time.
In GA4, we can also enhance the reports using the 1st party data, namely the User-ID.
This feature was also available in Universal Analytics, but the separate "User-ID View" included the "logged-in" sessions with User-ID solely and, honestly, wasn't that useful.
GA4 reports combine the User-ID data with the Client-ID (the Analytics cookie identifier) and Google Signals, which makes the data more complete, especially in the cross-device aspect and LTV measurement.
The complexity of these processes may cause greater or lesser discrepancies between the data in different reports.
We should get used to it, but hopefully, as GA4 recovers from childhood illnesses, these discrepancies will become less and less significant.
It is worth remembering that Google Analytics is not accounting software.
Its objective is not to record every event with 100% precision but to indicate trends and support decision-making – for which approximate data is sufficient.
Author's note: This article was written using Google help articles, answers given by Analytics support and results from my experiments.
The post Your guide to Google Analytics 4 attribution appeared first on Search Engine Land.
No comments:
Post a Comment