Missing Data¶
To check any data issues, you can either use dash.withcubed.com and check an account or simply log into control.withcubed.com/admin and then go to the main page control.withcubed.com/. This will display all accounts and a data point for each day that has had a successful cron job.
If it looks like data hasn't run for more than 1 day then it means it has failed. Our crons do not fully run if the previous day didn't finish.
Note
Some accounts in this list are on the live DB but are for testing different parts of our system. This means may show "blank" more often (as they have no data).
These are: Adobe Test, BGB, Cubed AI, Cubed Brand Website, Facebook Dev, PyDCS Test
Example¶
Check Control¶
Looking at image below, I can hover on the last known day that ran successfully.
This means the cron(s) did not reach the command update_report_account_usage
which is runs after the report
command block.
Check the Database¶
Log into live RDS, and switch to the attrib
db. We need to know now when/where a command failed, and what caused it.
Cron Parent Command¶
We can check the state it got up to by running the following sql:
use attrib;
select
d.token,
a.id, a.name, a.created, e.running, e.updated,
b.message, b.exception, b.stacktrace, e.duration
from attrib_command_item a
left join attrib_command_exception b on b.command_item_id = a.id
join attrib_command_account c on c.command_item_id = a.id
join attrib_account d on c.account_id = d.id
join attrib_command_state e on e.command_item_id = a.id
where a.created between '2021-08-14 00:00:00' and '2021-08-14 23:59:59'
and d.token in ('c-a-yard-uk')
and a.name = 'Batch agg and pct update commands'
This will give you the top level parent command for this Account on this cron job.
Note
If you are missing data for the 1st of the month, your command that failed started on the 2nd.
The above sql will give you top level information about the parent cron command. Useful information here is the id
, created
, duration
, and whether there is an exception
.
With the command Id above you can go to the following URL : https://control.withcubed.com/command/{command-id}
. Here you will see the standard output for all commands that ran, and if there was an error.
You can use this same URL for any command Id you find if you want to see more information about it - this includes whether it spawned other commands and all of it's print logs.
If there was an error you can now Rerun Commands.
Cron Children Commands¶
The below sql is another way to see all commands that ran that weren't the parent command:
use attrib;
select
d.token,
a.id, a.name, a.created, e.running, e.updated,
b.message, b.exception, b.stacktrace, e.duration
from attrib_command_item a
left join attrib_command_exception b on b.command_item_id = a.id
join attrib_command_account c on c.command_item_id = a.id
join attrib_account d on c.account_id = d.id
join attrib_command_state e on e.command_item_id = a.id
where a.created between '2021-08-14 00:00:00' and '2021-08-14 23:59:59'
and d.token in ('c-a-yard-uk')
and a.name <> 'Batch agg and pct update commands'
This will now the list the cron jobs individually. If there was an error it should be the last row, and contain an exception
with stacktrace
.
If commands suddenly stop, and there is no exception, chances are the box ran out of memory and killed everything. In this case - usually - a handful of accounts will have been affected. You can now jump to the next step.
A dead give away for this "Out of memory" scenario if you know a command isnt running but the running
flag is True. You can clarify this 1 of 2 ways, either checking the DB to see if anything is running, or dialing into the control ec2, and checking htop
. Read here for more information.
Re-run Commands¶
Once you have found the command that failed, check update_agg_pct_batch
and find the command block it was in.
Next step is to log into the control ec2, and then creating a screen
to run your command behind. This is important as it means if there's a network issue and we lose connection to the box, it will keep running.
The main screen
commands to use are:
screen -ls
this will list all screens. Where (Detached) is mentioned this means they are probably not in use.screen -S {name}
this will create a new screen with that name.screen -r {name}
will reconnect to a named screen.screen -d -r {name}
will detach who ever is currently on the named screen, then allow you to reconnect to it.
Once you are into a screen
you can navigate to the django manage folder:
cd /srv/attrib-backend/backend
Then you can simply run the cron command :
sudo python3 manage.py update_agg_pct_batch
Please see command file for other arguments that can be passed in.
Parameter | Description | Default | Example |
---|---|---|---|
account_token | An account token found in the DB | all |
--account_token="c-a-client-uk" |
account_list | A comma separated list of account tokens (no space). | none |
--account_list="c-a-client-uk,c-a-client-de,c-a-client-fr" |
startdate | The date to start running the command from. Must be in YYYY-MM-DD HH:MM:SS format. | yesterday 00:00:00 |
--startdate="2021-06-01 00:00:00" |
enddate | The date to stop running the command at. Must be in YYYY-MM-DD HH:MM:SS format | yesterday 23:59:59 |
--enddate="2021-06-01 23:59:59" |
date_list | A comma seperated list of dates to run for. Must be in YYYY-MM-DD format. *Note, this can be a quicker way to run a single day in the past. | none |
--date_list="2021-06-01, 2021-06-10" |
parallel | Force the commands to run in parallel not serially. | none |
--parallel |
daily | Force the commands to run daily not the whole time period specified. | none |
--daily |
ignore_post_pre | This will ignore the pre and post block of commands |
False |
--ignore_post_pre |
sudo python3 manage.py update_agg_pct_batch --startdate="2021-06-01 00:00:00" --enddate="2021-06-03 00:00:00" --account_list="client1,client2" --parallel --daily
Note
Only pass ignore_post_pre
if you know the cron was successful for pre
command block.
This is usually the case, but it is important this DID run before you try again.
Htop¶
Htop is a terminal version of window's Task Manager. Once you've opened it press F4 (Filter) and type "manage.py" or "python" - this will filter all the noise down to just our commands.
Press F5 to toggle "tree view". This can sometimes help to see which commands are children of another.