by William Woods
•
9 May 2025
The Data Dilemma We Face Digital marketing attribution and web analytics have never been more powerful, or more frustrating. As privacy rules tighten and cookie tracking gets sidelined, those of us working with analytics are left squinting at our dashboards wondering, “where did half my tracking data go?” The main issue? Missing channel attribution. When users opt out of tracking, the ‘where-did-they-come-from’ bit disappears. Was it Email, Organic, Paid Social? No idea. And that missing piece can break your entire attribution model, leading you to overfund underperformers or overlook your most effective channels. Google’s Consent Mode offers some help, filling gaps using statistical modelling. But it’s a broad-brush fix. Sometimes, you need precision, and that’s where custom machine learning for marketing attribution comes in. It provides a smarter, tailored way to reconstruct the full user journey, even when traditional tracking fails. The Shrinking Data Window Let’s talk scale. Your starting point might be 100% of user traffic, but as tracking restrictions kick in, the amount of observable data starts shrinking rapidly: GDPR , CCPA , and similar regulations reduce visibility by around 20%, leaving you with just 80%. Adblockers knock that down further to 64%. Apple ITP , Firefox , and other privacy-first browsers can drop you to 45%. By the time we hit Chrome’s expected 2025 updates, you might only see 19% of your original traffic. The consequences are serious: unreliable KPIs in analytics tools, difficulties in attributing ROI, weakening retargeting performance, and the erosion of data-driven marketing altogether. Teaching a Model to Connect the Dots What if you could train a model to learn from user journeys where the channel is known, and then use that knowledge to predict the missing bits in journeys where the channel is blank? That’s the idea behind this project. I created a machine learning model that learns to recognise patterns in both the summary of a journey and the step-by-step flow of events. Think of it like training a detective: it spots patterns in known cases and uses that to solve new mysteries. The model doesn’t rely on statistical averages like other models, it learns patterns across user behaviour, campaign metadata, and temporal sequences. That said, it reflects the distribution of channels seen in training, so more common channels will naturally have stronger learned representations. Feeding the Model Everything starts with BigQuery. Specifically, I'm working with Google Analytics 4 (GA4) data exported to BigQuery. The GA4 BigQuery export contains detailed event-level data from your website or app without relying on cookies for tracking. But what makes this data particularly powerful for modelling isn't just the standard GA4 parameters, it's the custom dimensions that businesses can define and pass with each event. For example, an e-commerce site might pass custom dimensions for product price brackets, while a content site might track content topics or reading time thresholds. When these custom dimensions are incorporated into the model alongside standard GA4 parameters, they create more accurate channel predictions by adding business context to behavioural signals. I group those events by user and line them up in the order they happened. For each journey, I create two views: Aggregated features : a summary snapshot of behaviour across the journey Sequential features : the journey in full, step-by-step, to catch patterns over time I convert all of this into dense numerical arrays using a handy tool called ‘ DictVectorizer ’, which translates a mix of categorical and numerical features into a standardised format that the model can process. This effectively turns complex user journey data into a structured numerical matrix suitable for training. By using both the standard GA4 export and your unique custom dimensions, the model effectively learns the specific patterns of your business and customers, not just generic browsing behaviours. Under the Bonnet Now to dive a bit into the ‘technicals’. The model has two parallel branches. An aggregated branch captures high-level frequency signals (e.g. how often a user interacted with a campaign or used a specific device), while the sequential branch preserves event order to pick up temporal dependencies (e.g. campaign -> browse -> purchase). Aggregated features branch : goes through a Dense (fully connected) layer with 128 neurons and a ReLU activation. This distils the whole journey into a kind of behaviour summary. Sequential features branch : starts with a Masking layer to skip over padded steps, then feeds into an LSTM (Long Short-Term Memory) layer with 128 units. LSTMs are brilliant at learning from sequences, perfect for time-based user journeys. I then combine the outputs of both branches (with a Concatenate layer) and send them through a final Dense layer with a SoftMax activation that produces the most probable channel.