Skip to content

CubedReportResource

backend/api/utils.py

This is the main class that all reports inherit. We built this class as a way to generically handle group by along with different types of aggregation.

This section will outline the main functions this class uses to return data. get_list is probably the most important function as it's where we begin the whole journey, and apply_filters is where we generate the query sets.

Note

This whole class exists to only provide group by aggregation. You don't always need to, but you should pass group_by=field_name.. for it to work. If you do NOT need aggregation, then use the standard tastypie ModelResource class

Query params we care about:

  • totals - boolean, should we return a dictionary of totals
  • graph - boolean, return the data at a day/time level
  • group_by - string, comma separated list of dimensions to group by before aggregating
  • having - string, key=value, comma separated list of metrics to filter by post aggregation
  • graph_group_by - string, comma separated list of date based fields to group the data by before aggregating
  • graph_group_by_type - string, field_name + granularity. For example sales_date__hourly
  • graph_order_by - string, rarely used, but if you wanted to order a graph's data by date but in reverse you could pass graph_order_by=-sales_date.
  • selected_fields - string, comma separated fields. If specified these will be the only fields that the resource will return in a response.

init

Each resource that inherits this class initalizes it's own instance. We first set some default properties/flags that we can use later, and/or modify as needed.

# if we should run both querysets
self.return_totals_and_main = False

# flagged True during request cycle if we should only return Totals
self.get_totals = False

# dictionary to hold the totals generated by django's queryset
self.totals = {}

# flagged True during request cycle if we are fetching graph data
self.get_graph = False

# hold fields for group_by - used later in request cycle
self.group_by_fields = []

self.amc = AttribMathCollection(BaseExpressions)
for expression in self.custom_expressions:
    logger.debug(f"Add expression '<{expression}>' to AMC.")
    self.amc.add_expressions(expression)

return_totals_and_main is class level and should be set on the resource that is inheriting this class. Some reports can return both and makes sense to do so. We turned this off by default as we split the frontend API queries out so each data set could be fetched on its own.

get_totals is set based on the query param passed up totals={true|false}. This is usually false when fetching the table body data, and then true when fetching the table's "totals".

self.totals = {} this is a dictionary that we populate. This is left over slightly from how we used to return both the table body data and totals together. The response would look like this:

{
    // tastypie meta details for pagination and other bits
    "meta": {},

    // the main tastypie response of data - usually table data
    "objects": [],

    // our custom addition, of key:value for each field returned
    "totals": {}
}

self.group_by_fields we set and populate this as soon as we intercept the query params. This is set on init as we need to check during other functions and we're not gauranteed it will be populated, or it might be empty.

self.amc = AttribMathCollection(BaseExpressions) this is where we initialize and set the base Expressions for this report resource. Each report gets its own base expressions which they can override if needed.

for expression in self.custom_expressions: - we take the class array variable and initialize each ExpressionCollection and add them to this report resource to use. Most reports will not have custom_expressions set.

additional_metrics

This is a python @property on the class and allows us to quickly access/read a metric property on the report. This can be useful when we're adding/removing additional_metrics during the cycle.

wrap_view

This is Tastypie's lowest function in the ModelResource and where they ask you to step in if you want to add custom logic. I'm not a big fan of this as you have to copy over a bunch of base logic - like we've done here. The important change we've made here is to to init() a new version of this class for every request. This is done as we're adding/removing fields dynamically based on the request made. This does mean we've kind of left the ability to easily cache, however cache-ing here was always going to be very tricky as every single request could (and would) be fetching different data sets. It would not be impossible to add, but would always have required a custom cache class - which Tastypie does support.

get_list

This is the main entry point to our Tastypie flow. It's where we read the query params and start preparing the class to deal with either: table data, totals, or graph data. The main query params we care about here are totals: {true|false} and graph: {true|false}, they will change the querysets we build, and where the flow of this cycle will go.

The main steps here are to call the internal Tastypie function obj_get_list which will call Tastypie's apply_filters, which we've repurposed for our own use. Once this has run we either format/prep the fields to be returned, or apply_sorting, followed by pagination, and then call full_dehydrate on the bundles.

Finally we return the function call create_response which will create a HTTPResponse with our serialized, and formatted, data.

create_totals_fields

This prepars the dicitonary totals to have the correct key:value pairing. It does this by creating a custom queryset for each field and will add a "t_" to the field names so Django can differentiate between the aggregate fields here, and the fields we annotated in create_queryset(). The "t_" will be removed in get_list when we clean the totals.

apply_filters

This is where we use the params from get_list and build our Django querysets. Following the SQL output examples above, you can see here in the list of function calls (and if checks) to generate the "base query set" (used for table data), followed by either the "totals query set" or "graph query set" both of which use the "base query set" as a nest query and aggregate that in different ways.

First we call create_additional_fields which will check all additional metrics we want to aggregate and make sure those "exist" on the model, so we can interact with them using Tastypie's default filter capabilities (ie field_name__gte=100).

Next we call create_queryset which generates the "base query set". This passes in request and applicable_filters- which allows us to use Tastypie's out of the box filtering capabilities.

Then we check if we have passed up any having= query params. These have to be treated differently as they are to be applied on the final aggregated data set.

Finally we begin to build the aggregation parts of the query set. First we build the group by part, and if have passed graph_group_by then we need to prepare to use that too. This then involves calling create_custom_fields which returns a dictionary of all fields we want to use (base + additional) and the chosen group by fields. Now we have everything needed to create an aggregated, group by, query set with any additional "having" filters, we call create_groupby_queryset. This is maybe the most readable part for someone who is comfortable with Django's ORM.

The last steps here take the base query set and either convert it into a "graph query set", or a "totals query set".

apply_sorting + apply_ordering

This makes sure we're ordering by the chosen value in the query params. Because we can have dynamically added fields, we need to tell Tastypie and Django it's ok and then build that part of the queryset.

inner_to_straight

Custom function to force all joins to be "straight" where it makes sense. This is to by-pass SQL's engine optimiser that we've had issues with previously. The way we build our querysets usually means we're ok to ignore the engine and join on the indexe we want, in the order we've chosen.

full_dehydrate

Here we format all row's of data by calling format_row which in turn calls format_field, which will call format on each field. See backend/api/fields.py for each field's format() functionality.
The idea here is if we're using a standard field, for example bool, make sure we're returning it correctly formatted, and if there's anything missing format_field will attemtp to make sure its pythonic in some way, and format_row will make sure we're not returning None.

format_field_total

Called when we're about to format and return the totals data. This is custom and done separate to the complete Tastypie flow - and so we dont hit full_dehydrate as normal. It follows the same principals and attempts to call format_total on each field.