Skip to content

Indepth

Cubed is a data collection platform built for website traffic. Below is the table design and how the data is structured.

We treat each user on site as a Visitor who has many Visit(s). This is important to differentiate, and you could think of a Visit as a session. We generate a unique 32 character token for each new Visitor and for each of their subsequent Visit(s). The token is stored in a cookie on the user's browser, while we store it in the DB. We drop 2 cookies to facilitate this vscr_vid (visitor id) and vscr_sid (session id). The vscr_vid cookie attempts to live forever (2 years max), while the vscr_sid will expire after 30mins of inactivity, or when the browser is closed. When someone arrives on site our Cubed tag can fire many pieces of data including: where the visitor came from via document.referer the page the tag fired on window.location.href cubed events - as defined in the account config section dash.withcubed.com

While processing this request we can read the headers and gain some extra information including: Ip address - for geo-location purposes OS details - name, version, etc.. Browser details - name, version, etc...

Once we've got this data validated and ready to be inserted we first check if the document.referer string matches any patterns that the user has created in the account config section (or the default Cubed patterns). All patterns are a simple regex string and must belong to 1 "Channel" (which in our system can be known as a "referer" due to legacy code/table names). For example we have a default pattern of "google.com", that belongs to the channel SEO. If a Visitor arrives on site and their document.referer is "https://www.google.com" - meaning they clicked a link from google's search page, then we would pattern match that and thus assign that pattern's Id to that visit. With this in mind you can now see how we can build a picture of a Visitor, arriving on site from different places across many days/weeks/months. It is not impossible for a Visitor to have a handful of Visits that first arrived via SEO, then Email, then Direct.

Finally to calculate "attribution" we need an "end point" in a Visitor's journey. What we call in the frontend a "Goal". Similar to "channel" and "referer" being inter-changeable, "Goal" and "Product" can be due to legacy code/table names. A Goal is basically a collection of "events" where one is flagged as a "sale" event. When processing all our data we use this event as a marker to say "the visitor's journey has finished, lets calculate how important all the visits were leading up to it".

Everything described above is stored in our "raw" tables all prefixed as attrib_. We started as an "attribution platform" and so we our naming convention follows suit. The "raw" tables also includes the configuration tables too. As mentioned above we can fire "cubed events" if they are defined in the account config section.

Config Tables

To configure a Cubed account we need Channels for Patterns to belong to, we need Events to fire, and we need to have a Goal with a designated "sale event".

table name description unique key(s) foreign key(s) extra details
attrib_referer The names of all the account's Referers/Channels/Buckets.
attrib_pattern Regex patterns and the channels that they belong to referer_id to attrib_referer.id
attrib_event All created events for the account
attrib_product Goals to hold events against
attrib_product_events A link of event(s) to Goals with 1 (or more) flagged as a Sale event_id to attrib_event.id, product_id to attrib_product.id

Raw data collection tables

Now lets look at the raw tables we insert into during data collection, including visitor with visits, the pages they saw, and the events they fired - along with any extra pieces of information.

table name description unique key(s) foreign key(s) extra details
attrib_visitor The top most data point for all visitor data token
attrib_visit The main data point we process and aggregate to, must belong to a visitor token visitor_id to attrib_visitor.id, pattern_id to attrib_pattern.id
attrib_visit_details Holds all extra info we get from the request headers including country, city, browser, and OS etc.. visit_id to attrib_visit.id
attrib_event_item Store event(s) to a visit visit_id to attrib_visit.id, event_id to attrib_event.id This can also store revenue and transaction_id for any event fired
attrib_event_item_details This will hold extra pieces of information for the fired event including name, category, sku. event_item_id to attrib_event_item.id
attrib_visitor_details This is a high level piece of data that allows us to join cubed data with external data via client_customer_id client_customer_id, client_customer_type visitor_id to attrib_visitor.id
attrib_page Store the raw URL collected, and it's separate components (host, port etc...) visit_id to attrib_visit.id
attrib_label Labels are a way to colour in a visit with bespoke meta data. We allow mutliple "types" - as dictated by the columns visit_id to attrib_visit.id, page_id to attrib_page.id, event_item_id to attrib_event_item.id, event_item_details_id to attrib_event_item.id

Validation

If there is an issue with the data collection (whether its the Config, the tag, or on site) we will return a message to the browser and insert a validation message. This uses 4 tables to hold the data including message, and that message's page and details (exact same as visit page and visit detail - see above.)

table name description unique key(s) foreign key(s) extra details
attrib_validation Base table to store a validation Id against a visitor and visit visit_id to attrib_visit.id, visitor_id to attrib_visitor.id
attrib_validation_message This holds the main details including name, message, and value validation_id to attrib_validation.id
attrib_validation_page Store a stripped down verison of attrib_page for the validation item validation_id to attrib_validation.id
attrib_validation_details Store a stripped down version of attrib_visit_details for the validation item validation_id to attrib_validation.id

The idea here is you can check for validation items for any client and check for trends. Are more validation messages coming from a certain page? Or perhaps a specific browser/device. This is where you can find that information out.

Syncs

We are now tracking a Visitor on a website on their device. The next question is: what if that same Visitor accesses the website via another device? Some companies attempt to do "finger printing" by comparing device details+ip+user behaviour, but we have opted for a simple and gauranteed method.

The Cubed tag can fire a sync_token which we store per Visitor. A sync_token is an identifiable piece of information as specified by the client. Usually this is something like a username, email address, account id or customer id.

Once we have a sync we check every request against our sync table, and if we find a matching token we now know that Visitor is actually another Visitor and we nove those Visit(s) over.

All sync_tokens are hashed at the point of collection and thus we NEVER insert the actual sync_token. Though this can make it difficult to debug sync issues, this is a security and privacy protocol that we take seriously.

table name description unique key(s) foreign key(s) extra details
attrib_sync Store the hashed sync token against a visit token visit_id to attrib_visit.id
attrib_sync_point The point where the sync was created sync_id to attrib_sync.id
attrib_sync_map Quick lookup table of current visitor to previous visitor id old_visitor_id old_visitor_id to attrib_visitor.id, new_visitor_id to attrib_visitor.id
attrib_sync_visit Link a sync_point.id with an attrib_visit.id point_id to attrib_sync_point.id, visit_id to attrib_visit.id

Raw tables post crons

Below are tables that are very close to the raw data collection tables, but are currently updated from the crons.

table name description unique key(s) foreign key(s) extra details
attrib_path Keep a row per URL, using a hashing of that string as the "token". token This table is currently inserted into during the crons, but can/will be moved to the DCS
attrib_page_item Store a reference to each path seen by each visit and their original attrib_page page_id visit_id to attrib_visit.id, path_id to attrib_path.id, page_id to attrib_page.id This table is currently inserted into during the crons, but can/will be moved to the DCS