Further Explanation¶
To understand what our CubedReportResource
class is doing, its best to understand the underlining data and how aggregation works.
Let's say we have the following Model:
class ReportMyData(models.Model):
date = models.DateTimeField(db_index=True)
referer = models.ForeignKey('Referer', on_delete=models.PROTECT)
visits = models.IntegerField()
sales = models.IntegerField()
revenue = models.DecimalField(max_digits=10, decimal_places=2)
class Meta:
db_table = 'report_my_table'
unique_together = (('date', 'product'),)
And now in our DB we have a table of data like this:
date | referer_id | visits | sales | revenue |
---|---|---|---|---|
01/01/2023 | 1 | 4 | 3 | 100 |
01/01/2023 | 2 | 4 | 2 | 50 |
01/01/2023 | 3 | 4 | 1 | 40 |
02/01/2023 | 1 | 8 | 5 | 75 |
02/01/2023 | 2 | 8 | 4 | 80 |
02/01/2023 | 3 | 8 | 2 | 35 |
03/01/2023 | 1 | 2 | 7 | 55 |
03/01/2023 | 2 | 2 | 3 | 25 |
03/01/2023 | 3 | 2 | 2 | 15 |
04/01/2023 | 1 | 6 | 4 | 35 |
04/01/2023 | 2 | 6 | 1 | 10 |
04/01/2023 | 3 | 6 | 0 | 0 |
Lets now create a Tastypie resource using our custom CubedReportResource
class MyDataReportResource(CubedReportResource):
referer = fields.ForeignKey(RefererLiteResource, 'referer', full=True)
class Meta(ReportMeta):
queryset = ReportMyData.objects.prefetch_related('product')
resource_name = "report-my-data"
config_fields = BaseFields(show_totals=True, full_ordering=True)
self.config_fields.install('referer', DictField(), filtering=ALL_WITH_RELATIONS, groupby=True)
config_fields.install('date', DateField(), filtering=ALL_WITH_RELATIONS, groupby=True)
config_fields.install('visits', IntegerField())
config_fields.install('sales', IntegerField())
config_fields.install('revenue', MoneyField())
The important bits to note here are: we are using install
for base metrics on the table. If we wanted to add a custom metric we'd do so via install_additional_mettric
. Both functions pass a Cubed Field
type so we now how to format the returned values.
Also see that the 2 dimension fields on the table are flagged as groupby=True
as we want to be able to group by
only these columns.
Note
You can inherit meta
from BaseReportMCSMeta
which comes with a few of the "base" fields already defined and ready to be used on config_fields
.
There are 3 main data sets we can return, here they are with their matching query params:
Data | Query params | Use group_by |
---|---|---|
Table | totals:false&graph:false | yes |
Totals | totals:true&graph:false | no |
Graph | totals:false&graph:true | yes |
None | totals:true&graph:true | n/a |
"Totals" does not need to `group by`` a dimension as we're going to get the complete totals of the entire aggregated data set.
Examples¶
Lets now look at some examples based on query params sent up, and see what SQL would get generated, and thus what data would be returned.
Table Data¶
/report-my-data/?account=1&date__gte=2023-01-01&date__lte=2023-01-04&product__in=1,2,3&group_by=referer
select
b.name,
sum(a.sales) as `sales`,
sum(a.revenue) as `revenue`,
sum(a.visits) as `visits`
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
group by b.name
This will return to the frontend:
Referer | Sales | Revenue | Visits |
---|---|---|---|
First | 19 | 265 | 20 |
Second | 10 | 165 | 20 |
Third | 5 | 90 | 20 |
Table Totals¶
/report-my-data/?account=1&date__gte=2023-01-01&date__lte=2023-01-04&product__in=1,2,3&totals=true
select
sum(x.sales),
sum(x.revenue),
sum(x.visits),
from (
select
b.name,
sum(a.sales) as `sales`,
sum(a.revenue) as `revenue`,
sum(a.visits) as `visits`
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
group by b.name
) x
This will return to the frontend:
Sales | Revenue | Visits |
---|---|---|
34 | 520 | 60 |
Graph data¶
/report-my-data/?account=1&date__gte=2023-01-01&date__lte=2023-01-04&product__in=1,2,3&graph=true&graph_group_by=date&graph_group_by_type=date__day&graph_metrics=sales
select
date(x.date)
sum(x.sales)
from (
select
date(a.date) as `date`
b.name,
sum(a.sales) as `sales`,
sum(a.revenue) as `revenue`,
sum(a.visits) as `visits`
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
group by date(a.date), b.name
) x
group by date(x.date)
This will return:
Date | Sales |
---|---|
01/01/2023 | 6 |
02/01/2023 | 11 |
03/01/2023 | 12 |
04/01/2023 | 5 |
Having¶
The reason we have to build the "base query set" and then wrap that in another select, is to allow for users to filter on the data they see in the front end. Given the examples above, lets say we're looking at this table of data:
Referer | Sales | Revenue | Visits |
---|---|---|---|
First | 19 | 265 | 20 |
Second | 10 | 165 | 20 |
Third | 5 | 90 | 20 |
If we were to "filter" on "sales" where "sales is > 5", without the nested subquery, we would generate this:
select
b.name,
sum(a.sales) as `sales`,
sum(a.revenue) as `revenue`,
sum(a.visits) as `visits`
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
and a.sales > 5
group by b.name
Which would return:
Referer | Sales | Revenue | Visits |
---|---|---|---|
First | 7 | 55 | 2 |
This is because we're filtering on the individual rows in the table, rather than the aggregated data which is what our user has seen.
However, because of how we apply the having
filtering to our outter query, we now get this:
select
x.name,
sum(x.sales),
sum(x.revenue),
sum(x.visits)
from (
select
b.name,
sum(a.sales) as `sales`,
sum(a.revenue) as `revenue`,
sum(a.visits) as `visits`
from report_my_table a
join attrib_referer b on a.referer_id = b.id
where a.date between '2023-01-01 00:00:00' and '2023-01-04 23:59:59'
group by b.name
) x
having sum(x.sales) > 5
Which will return:
Referer | Sales | Revenue | Visits |
---|---|---|---|
First | 19 | 265 | 20 |
Second | 10 | 165 | 20 |
Each query generated for table, totals, and graph, will apply having
in the same way so they will all always reflect the correct data sets.