Skip to content

Further Explanation

To understand what our CubedReportResource class is doing, its best to understand the underlining data and how aggregation works.

Let's say we have the following Model:

class ReportMyData(models.Model):
    date = models.DateTimeField(db_index=True)    
    referer = models.ForeignKey('Referer', on_delete=models.PROTECT)
    visits = models.IntegerField()
    sales = models.IntegerField()
    revenue = models.DecimalField(max_digits=10, decimal_places=2)

    class Meta:
        db_table = 'report_my_table'
        unique_together = (('date', 'product'),)

And now in our DB we have a table of data like this:

date referer_id visits sales revenue
01/01/2023 1 4 3 100
01/01/2023 2 4 2 50
01/01/2023 3 4 1 40
02/01/2023 1 8 5 75
02/01/2023 2 8 4 80
02/01/2023 3 8 2 35
03/01/2023 1 2 7 55
03/01/2023 2 2 3 25
03/01/2023 3 2 2 15
04/01/2023 1 6 4 35
04/01/2023 2 6 1 10
04/01/2023 3 6 0 0

Lets now create a Tastypie resource using our custom CubedReportResource

class MyDataReportResource(CubedReportResource):    
    referer = fields.ForeignKey(RefererLiteResource, 'referer', full=True)

    class Meta(ReportMeta):
        queryset = ReportMyData.objects.prefetch_related('product')
        resource_name = "report-my-data"
        config_fields = BaseFields(show_totals=True, full_ordering=True)

        self.config_fields.install('referer', DictField(), filtering=ALL_WITH_RELATIONS, groupby=True)
        config_fields.install('date', DateField(), filtering=ALL_WITH_RELATIONS, groupby=True)
        config_fields.install('visits', IntegerField())
        config_fields.install('sales', IntegerField())
        config_fields.install('revenue', MoneyField())

The important bits to note here are: we are using install for base metrics on the table. If we wanted to add a custom metric we'd do so via install_additional_mettric. Both functions pass a Cubed Field type so we now how to format the returned values.
Also see that the 2 dimension fields on the table are flagged as groupby=True as we want to be able to group by only these columns.

Note

You can inherit meta from BaseReportMCSMeta which comes with a few of the "base" fields already defined and ready to be used on config_fields.


There are 3 main data sets we can return, here they are with their matching query params:

Data Query params Use group_by
Table totals:false&graph:false yes
Totals totals:true&graph:false no
Graph totals:false&graph:true yes
None totals:true&graph:true n/a

"Totals" does not need to `group by`` a dimension as we're going to get the complete totals of the entire aggregated data set.

Examples

Lets now look at some examples based on query params sent up, and see what SQL would get generated, and thus what data would be returned.

Table Data

/report-my-data/?account=1&date__gte=2023-01-01&date__lte=2023-01-04&product__in=1,2,3&group_by=referer

select 
    b.name, 
    sum(a.sales) as `sales`,
    sum(a.revenue) as `revenue`,
    sum(a.visits) as `visits`    
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
group by b.name

This will return to the frontend:

Referer Sales Revenue Visits
First 19 265 20
Second 10 165 20
Third 5 90 20

Table Totals

/report-my-data/?account=1&date__gte=2023-01-01&date__lte=2023-01-04&product__in=1,2,3&totals=true

select 
    sum(x.sales),
    sum(x.revenue),
    sum(x.visits),
from (
    select 
        b.name, 
        sum(a.sales) as `sales`,
        sum(a.revenue) as `revenue`,
        sum(a.visits) as `visits`   
    from report_my_table a
    join attrib_referer b on a.referer_id = b.id
    where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
    group by b.name
) x

This will return to the frontend:

Sales Revenue Visits
34 520 60

Graph data

/report-my-data/?account=1&date__gte=2023-01-01&date__lte=2023-01-04&product__in=1,2,3&graph=true&graph_group_by=date&graph_group_by_type=date__day&graph_metrics=sales

select 
    date(x.date)
    sum(x.sales)
from (
    select 
        date(a.date) as `date`
        b.name,
        sum(a.sales) as `sales`,
        sum(a.revenue) as `revenue`,
        sum(a.visits) as `visits`   
    from report_my_table a
    join attrib_referer b on a.referer_id = b.id
    where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
    group by date(a.date), b.name
) x
group by date(x.date)

This will return:

Date Sales
01/01/2023 6
02/01/2023 11
03/01/2023 12
04/01/2023 5

Having

The reason we have to build the "base query set" and then wrap that in another select, is to allow for users to filter on the data they see in the front end. Given the examples above, lets say we're looking at this table of data:

Referer Sales Revenue Visits
First 19 265 20
Second 10 165 20
Third 5 90 20

If we were to "filter" on "sales" where "sales is > 5", without the nested subquery, we would generate this:

select 
    b.name, 
    sum(a.sales) as `sales`,
    sum(a.revenue) as `revenue`,
    sum(a.visits) as `visits`    
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
and a.sales > 5
group by b.name

Which would return:

Referer Sales Revenue Visits
First 7 55 2

This is because we're filtering on the individual rows in the table, rather than the aggregated data which is what our user has seen. However, because of how we apply the having filtering to our outter query, we now get this:

select
    x.name,
    sum(x.sales),
    sum(x.revenue),
    sum(x.visits)
from (
    select 
        b.name, 
        sum(a.sales) as `sales`,
        sum(a.revenue) as `revenue`,
        sum(a.visits) as `visits`    
    from report_my_table a
    join attrib_referer b on a.referer_id = b.id
    where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
    group by b.name
) x
having sum(x.sales) > 5

Which will return:

Referer Sales Revenue Visits
First 19 265 20
Second 10 165 20

Each query generated for table, totals, and graph, will apply having in the same way so they will all always reflect the correct data sets.