Release: Client-Side dataLayer validation Library

If you're a web analyst, you SHOULD be worried about data quality, we want to have data, but we need this data to be correct, there's no point of having tons of corrupted data that we won't be able to sue and for this we need to monitor our implementations regularly to be sure our data is always being reported properly.

Since We all are web analyst I'm pretty sure we are relying a flawless dataLayer ( are we? :) ). But I'm sure that at least once in the past our precious data holding variable ( aka "datalayer" ) ended messed up for whatever reason ( a refactor task that messed up some keys, some JS code that overrided a value ).

It's true there are out there plenty of solutions to monitor our implementations setups like Observe Point , Hub Scan , which do a real great job, but that maybe have too many features that we won't be use at all.

On this post we'll talking about build a dataLayer model monitoring tool, ( yep, Simo and I already published an automatated testing tool some time ago: https://www.simoahava.com/analytics/automated-tests-for-google-tag-managers-datalayer/ ) , but that implies having some IT infraestructure, the checks are going to be done each day, it will only be tested under just one JS engine.

So this time my proposal is doing the monitoring client-side. This will allow us to have a full sampling from almost any browser/connection/location on how the dataLayer is being reported for users!.

The advantages are:

We'll get alerts in real time
No need for any extra servers, infrastructure ( we'll be able to use GA )
We'll have data for all kind of browsers and users
Easy setup

For this we'll be using a little validation library that I wrote some years ago, that will be take care of validatin the current dataLayer model against a predefined dataLayer Schema rules.

Let's start adding a tag with the library on GTM, we'll just need to copy and paste the following snippet, there's not need to change any on it

/* Code Moved to GitHub*/
Get the snippet from the following URL: 
https://raw.githubusercontent.com/david-vallejo-com/cs-datalayer-monitor/master/build/bundle.min.js

Next thing to do is defining our dataLayer validation Schema, which is defined this way:

var mySchema = { "page": {
                       "author": "required",
               },
		       "user": {
				        "id": "integer",
				        "name": "contains:David"
   		      }
};

We'll need to define each key we want to monitor, and the validation rule we want to apply.

Right now the following rules are available:

present ( this checks if a key is present, despite it's set as null, undefined, empty string )
required ( this requires the key to have a value )
contains ( case sensitive )
notcontains ( case sensitive )
regex ( case insensitive )
integer
float
string
object

If the post gets enough attention I'm planning on adding support for "contains", "regex", comparisions ( <=> ), and chained checks.

Then we'll need to call the validation library:

var validationResults = window._validateDataLayer(myDatalayer , mySchema ,false);

The dataLayer validation library will return an array of errors found in the datalayer if any are found. From that on we could for example fire an event to GA, or notifing any other tool we want.

I'm attaching a small snippets that take care of grabbing the current Google Tag Manager dataLayer model and then it pushes the data to the dataLayer, from that point we can do whatever we want with the info.

// This goes after the library code above.
<script>
/* 
[REPLACE WITH THE CODE FROM GITHUB]
*/

var myDatalayer = { user :{ id: 1, name:'david' }};
var mySchema = { "page": {
                       "author": "required",
               },
		       "user": {
				        "id": "integer",
				        "name": "contains:David"
   		      }
};
var validationResults = window._validateDataLayer(myDatalayer , mySchema ,false);

if(validationResults && validationResults.length>0){     
        dataLayer.push({
            event: 'dataLayer-validation-errors',
            errors: validationResults
        });
}
</script>

You may be thinking that this may led on hundreds of pushed being send to GTM, but the library has a throttling cookie, which will just send 1 event per browser session, that way we'll be only notified once per user/day.

This should work for any other TMS rather than GTM, but for GTM we need to have an extra thing in mind, and it's WHEN we fire the validation tool.
We need to defined at which execution/event time we want to check the data. I suggest to fire the validation on Window.Load event ( gtm.load ) , since in almost most time the dataLayer model will be totally loaded, but if our setup is relying on All Pages (gtm.js) and we expect the data to be available on that event ( or any other custom one ), we'll be able to set the triggers for our monitoring tag for that same event! )

The new version includes a dataLayer grabing help, so you would only need to do the following.

<script>
/* 
[REPLACE WITH THE CODE FROM GITHUB]
*/

var mySchema = { "page": {"author": "required"},
	       "user": {
                       "id": "integer",
		       "name": "contains:David"
   	       }
};
var validationResults = window._validateDataLayer('gtm', mySchema ,false);
if(validationResults && validationResults.length>0){     
        dataLayer.push({
            event: 'dataLayer-validation-errors',
            errors: validationResults
        });
}
</script>

Let me know your thoughs and improvement ideas!

Official Library Repository: https://github.com/david-vallejo-com/cs-datalayer-monitor