CubedReportResource¶
backend/api/utils.py
This is the main class that all reports inherit. We built this class as a way to generically handle group by
along with different types of aggregation
.
This section will outline the main functions this class uses to return data. get_list
is probably the most important function as it's where we begin the whole journey, and apply_filters
is where we generate the query sets.
Note
This whole class exists to only provide group by
aggregation. You don't always need to, but you should pass group_by=field_name..
for it to work.
If you do NOT need aggregation, then use the standard tastypie ModelResource
class
Query params we care about:
totals
-boolean
, should we return a dictionary of totalsgraph
-boolean
, return the data at a day/time levelgroup_by
-string
, comma separated list ofdimensions
to group by before aggregatinghaving
-string
,key=value
, comma separated list ofmetrics
to filter by post aggregationgraph_group_by
-string
, comma separated list ofdate
based fields to group the data by before aggregatinggraph_group_by_type
-string
, field_name + granularity. For examplesales_date__hourly
graph_order_by
-string
, rarely used, but if you wanted to order a graph's data by date but in reverse you could passgraph_order_by=-sales_date
.selected_fields
-string
, comma separated fields. If specified these will be the only fields that the resource will return in a response.
init¶
Each resource that inherits this class initalizes it's own instance. We first set some default properties/flags that we can use later, and/or modify as needed.
# if we should run both querysets
self.return_totals_and_main = False
# flagged True during request cycle if we should only return Totals
self.get_totals = False
# dictionary to hold the totals generated by django's queryset
self.totals = {}
# flagged True during request cycle if we are fetching graph data
self.get_graph = False
# hold fields for group_by - used later in request cycle
self.group_by_fields = []
self.amc = AttribMathCollection(BaseExpressions)
for expression in self.custom_expressions:
logger.debug(f"Add expression '<{expression}>' to AMC.")
self.amc.add_expressions(expression)
return_totals_and_main
is class level and should be set on the resource that is inheriting this class.
Some reports can return both and makes sense to do so. We turned this off by default
as we split the frontend API queries out so each data set could be fetched on its own.
get_totals
is set based on the query param passed up totals={true|false}
. This is usually false
when fetching the table body data, and then true
when fetching the table's "totals".
self.totals = {}
this is a dictionary that we populate. This is left over slightly from how we used to return both the table body data and totals together. The response would look like this:
{
// tastypie meta details for pagination and other bits
"meta": {},
// the main tastypie response of data - usually table data
"objects": [],
// our custom addition, of key:value for each field returned
"totals": {}
}
self.group_by_fields
we set and populate this as soon as we intercept the query params. This is set on init
as we need to check during other functions and we're not gauranteed it will be populated, or it might be empty.
self.amc = AttribMathCollection(BaseExpressions)
this is where we initialize and set the base Expressions
for this report resource. Each report gets its own base expressions which they can override if needed.
for expression in self.custom_expressions:
- we take the class array variable and initialize each ExpressionCollection
and add them to this report resource to use.
Most reports will not have custom_expressions
set.
additional_metrics¶
This is a python @property
on the class and allows us to quickly access/read a metric property on the report. This can be useful when we're adding/removing additional_metrics during the cycle.
wrap_view¶
This is Tastypie's lowest function in the ModelResource
and where they ask you to step in if you want to add custom logic. I'm not a big fan of this as you have to copy over a bunch of base logic - like we've done here.
The important change we've made here is to to init() a new version of this class for every request. This is done as we're adding/removing fields dynamically based on the request made. This does mean we've kind of left the ability to easily cache
, however cache-ing here was always going to be very tricky as every single request could (and would) be fetching different data sets. It would not be impossible to add, but would always have required a custom cache class - which Tastypie does support.
get_list¶
This is the main entry point to our Tastypie flow. It's where we read the query params and start preparing the class to deal with either: table data, totals, or graph data.
The main query params we care about here are totals: {true|false}
and graph: {true|false}
, they will change the querysets we build, and where the flow of this cycle will go.
The main steps here are to call the internal Tastypie function obj_get_list
which will call Tastypie's apply_filters
, which we've repurposed for our own use.
Once this has run we either format/prep the fields to be returned, or apply_sorting
, followed by pagination
, and then call full_dehydrate
on the bundles
.
Finally we return the function call create_response
which will create a HTTPResponse
with our serialized, and formatted, data.
create_totals_fields¶
This prepars the dicitonary totals
to have the correct key:value
pairing. It does this by creating a custom queryset for each field and will add a "t_" to the field names so Django can differentiate between the aggregate fields here, and the fields we annotated in create_queryset()
. The "t_" will be removed in get_list
when we clean the totals.
apply_filters¶
This is where we use the params from get_list
and build our Django querysets. Following the SQL output examples above, you can see here in the list of function calls (and if
checks) to generate the "base query set" (used for table data), followed by either the "totals query set" or "graph query set" both of which use the "base query set" as a nest query and aggregate that in different ways.
First we call create_additional_fields
which will check all additional metrics we want to aggregate and make sure those "exist" on the model, so we can interact with them using Tastypie's default filter capabilities (ie field_name__gte=100
).
Next we call create_queryset
which generates the "base query set". This passes in request
and applicable_filters
- which allows us to use Tastypie's out of the box filtering capabilities.
Then we check if we have passed up any having=
query params. These have to be treated differently as they are to be applied on the final aggregated data set.
Finally we begin to build the aggregation parts of the query set. First we build the group by
part, and if have passed graph_group_by
then we need to prepare to use that too.
This then involves calling create_custom_fields
which returns a dictionary of all fields we want to use (base + additional) and the chosen group by fields. Now we have everything needed to create an aggregated, group by, query set with any additional "having" filters, we call create_groupby_queryset
. This is maybe the most readable part for someone who is comfortable with Django's ORM.
The last steps here take the base query set and either convert it into a "graph query set", or a "totals query set".
apply_sorting + apply_ordering¶
This makes sure we're ordering by the chosen value in the query params. Because we can have dynamically added fields, we need to tell Tastypie and Django it's ok and then build that part of the queryset.
inner_to_straight¶
Custom function to force all joins to be "straight" where it makes sense. This is to by-pass SQL's engine optimiser that we've had issues with previously. The way we build our querysets usually means we're ok to ignore the engine and join on the indexe we want, in the order we've chosen.
full_dehydrate¶
Here we format all row's of data by calling format_row
which in turn calls format_field
, which will call format
on each field. See backend/api/fields.py
for each field's format()
functionality.
The idea here is if we're using a standard field, for example bool
, make sure we're returning it correctly formatted, and if there's anything missing format_field
will attemtp to make sure its pythonic
in some way, and format_row
will make sure we're not returning None
.
format_field_total¶
Called when we're about to format and return the totals data. This is custom and done separate to the complete Tastypie flow - and so we dont hit full_dehydrate
as normal. It follows the same principals and attempts to call format_total
on each field.