Debugging Command¶

Enable/Disable Accounts¶

Which accounts are enabled is controlled by the COMMAND_ALLOWED_ACCOUNTS environment variable. This is a comma separated list of account tokens, e.g.

COMMAND_ALLOWED_ACCOUNTS=c-a-yard-uk,c-a-visscore

These are configured in Bitbucket under the deployment settings.

Warning

You must add the same variables for both Production-Command and Production-Control, as they use the same setting to determine if they should run commands for accounts or not.

You must also deploy both of these environments when making a change.

Connected to the Command database¶

The command system runs on its own dedicated database.

Environment	URI
Production	cubed-command.cevhomzj8can.eu-west-1.rds.amazonaws.com
Staging	cubed-command-staging.cevhomzj8can.eu-west-1.rds.amazonaws.com

For users created by Terraform, you can use your usual username/password to connect. Otherwise, you can find the root credentials in AWS Secrets Manager under Staging/RDS/Command or Prod/RDS/Command.

Debugging Queries¶

Recent Activity¶

SELECT
b.name as definition_name,
c.token as account,
CONCAT(a.start_date, " to ", a.end_date) as date_range,
(
    SELECT
    GROUP_CONCAT(
        CASE
            WHEN (x.status = 0) THEN "Waiting"
            WHEN (x.status = 1) THEN "Running"
            WHEN (x.status = 2) THEN "Timeout"
            WHEN (x.status = 3) THEN "Failed"
            WHEN (x.status = 4) THEN "Success"
            ELSE "Unknown"
        END
        ORDER BY x.waiting_at ASC
    )
    FROM command_run x
    WHERE x.slot_id = a.id
) AS timeline,
(
    SELECT
    SUM(
        CASE
        WHEN (x.status = 2) THEN 1
        WHEN (x.status = 3) THEN 1
        ELSE 0
        END
    )
    FROM command_run x
    WHERE x.slot_id = a.id
    ORDER BY x.waiting_at DESC
    LIMIT 10
) = 10 as is_dead,
(
    SELECT
    (
        CASE
        WHEN (x.status = 0) THEN "Waiting"
        WHEN (x.status = 1) THEN "Running"
        WHEN (x.status = 2) THEN "Timeout"
        WHEN (x.status = 3) THEN "Failed"
        WHEN (x.status = 4) THEN "Success"
        ELSE "Unknown"
        END
    )
    FROM command_run x
    WHERE x.slot_id = a.id
    ORDER BY x.waiting_at DESC
    LIMIT 1
) as last_status,
(
    SELECT
    COALESCE(
        success_at,
        failed_at,
        timeout_at,
        running_at,
        waiting_at
    )
    FROM command_run x
    WHERE x.slot_id = a.id
    ORDER BY x.waiting_at DESC
    LIMIT 1
) as last_changed_at,
(
    SELECT x.id
    FROM command_run x
    WHERE x.slot_id = a.id
    ORDER BY x.waiting_at DESC
    LIMIT 1
) as last_command_run_id
FROM command_slot a
JOIN command_definition b ON a.definition_id = b.id
LEFT JOIN command_account c ON a.account_id = c.id
ORDER BY last_changed_at DESC
LIMIT 10

Filtering by account

You can filter by account by appending something similar to this to the above query:

WHERE c.token = "c-a-yardstore-uk"

Filtering by status

You can filter by status by appending something similar to this to the above query:

WHERE (
    SELECT x.status
    FROM command_run x
    WHERE x.slot_id = a.id
    ORDER BY x.waiting_at DESC
    LIMIT 1
) = @CommandRunStatus

Where @CommandRunStatus is one of the values below:

Friendly Name	EnumValue	Description
Waiting	0	Run is waiting to be executed
Running	1	Run is being executed
Timeout	2	Run has not reported it has finished or succeeded before a timeout
Failed	3	Run reported that it failed
Success	4	Run reported that it finished successfully

Filtering by date

If you want to see which commands should be run on a particular date, simply remove the limit (~~LIMIT 10~~) and add something similar to below:

WHERE (NOT a.start_date >= '2022-03-03 23:59:59')
AND (NOT a.end_date <= '2022-03-03 00:00:00')

Often you want to check what is happening with slots that were supposed to have been run yesterday.

WHERE (NOT a.start_date >= DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY),'%Y-%m-%d 23:59:59'))
AND (NOT a.end_date <= DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 1 DAY) , '%Y-%m-%d 00:00:00'))

The hours/minutes/seconds are important and should not be changed.

After running the query, you should see a result with the following columns:

Column Name	Description	Example
definition_name	Definition name (see `command_definition` table). Usually the name of a cron.	`update_fabric_seogd_traffic`
account	The account token the command was executed for	`c-a-yardstore-uk`
date_range	The time slot the command was executed for, see `command_slot`	`2022-03-02 00:30:49 to 2022-03-03 00:30:48`
timeline	The history of executions for current time slot	`Failed,Timeout,Failed,Failed,Success`
last_status	The status of the last `command_run` for this slot	`Success`
last_changed_at	The timestamp of the last `command_run` for this slot	`2022-03-03 03:10:14`
last_command_run_id	The id of the last `command_run` for this slot	`b6afe06ef11146558fd6969df75ad999`

Debugging failures¶

If you have found a failed command and you know its command_run_id.

You can find the logs via:

SELECT e.name, a.text, a.created 
FROM attrib_command_log a
JOIN attrib_command_log_lookup b ON a.id = b.command_log_id
JOIN attrib_command_item c ON b.command_item_id = c.id
JOIN attrib_command_run_lookup d ON d.command_item_id = c.id
JOIN attrib_log_type e ON a.print_type_id = e.id
WHERE d.command_run_id = @CommandRunId
ORDER BY a.order ASC

You can find the stack trace via:

SELECT d.failed, d.running, c.exception, c.stacktrace
FROM attrib_command_item a
JOIN attrib_command_run_lookup b ON b.command_item_id = a.id
LEFT JOIN attrib_command_exception c ON c.command_item_id = a.id
JOIN attrib_command_state d ON d.command_item_id = a.id
WHERE b.command_run_id =  @CommandRunId

Note

These should be run on the cubed-config databases and not the command ones.

Debugging Agents/Services¶

Currently everything related to Command is run on a single EC2 instance. These can be found in the following locations:

Environment	URI
Production	command-control.withcubed.com
Staging	command-control.staging.withcubed.com

There are three services that run continuously on these instances:

Service Name	Description
command-agent-engine	The brains of the operation, in charge of updating all the tables (`command_definition`, `command_slot` etc) and scheduling the execution of commands
command-agent-cron	In charge of actually executing "cron" style commands. It runs commands in parallel, with up to 5 running at a time (although this is configurable)
command-agent-metrics	Published metrics to Datadog

These agents are setup as standard systemd services, and so can be debugged with the usual tools. (systemctl, journalctl)