Skip to content

Command Definition

The definition is the central part of the Conductor system. It is used to abstractly define a piece of work to be executed, using fields to say how often it should be run using frequency, what should trigger it to start using start policies, how it should retry using retry policies, and if it should be run for multiple accounts using matrix.

This definition is later used by the engine to build a list of slots and later on runs.

Database Structure

Label Type Description Notes
id UUID
name String A friendly name of what the definition represents ⚠ When using a source of "Cron". This must be the name of the Django command to be run (e.g. update_agg_sales)
source CommandDefinitionSource See source
source_token String A token used by the engine to determine which definitions should be added or removed if they were automatically added, based on the source.
frequency CommandDefinitionFrequency See frequency
start_policy CommandDefinitionStartPolicy See start policies
retry_policy CommandDefinitionRetryPolicy See retry policies
matrix CommandDefinitionMatrix See matrix
starts_on DateTime Slots are created for a definition starting at starts_on and continuously afterwards
created DateTime When this definition was created, used for debugging purposes
active Boolean Whether or not this definition is active or not ⚠ Is ignored by the engine

Source

Type Database Value Description Notes
Manual 0 Command has been created outside of any automated process
Cron 1 Command has been created based on the existing agg_pct_batch cron list

Future sources

Agents can use the source field to filter runs to only ones that the agent has been specifically written to execute.

Currently that only includes "Cron" (aka Django management) commands. Some ideas for future sources might be:

Source Agent
StoredProcedure Run by agent specifically tailored to running stored procedures
MLCommand Run by agent on the R/ML servers - rather than via SSH/Fabric

One issue with current implementation is that there isn't anywhere to store metadata related to executing any particular source.

Currently the name of the Django management command (e.g. update_agg_sales) is stored as the name of the definition, however it would be more suitable to store this in a separate table.

You could then extend this to new sources - a table to store which params should be passed to the stored procedure, etc.

Frequency

A command definition can be configured to run at different frequencies. For example, you might want a command to only run once a month. Changing frequency influences the start_date/end_date of generated slots

Type Database Value Description Notes
Hourly 0 Runs once an hour
Daily 1 Runs daily ℹ The most used frequency, used by the crons
Weekly 2 Runs once a week
Monthly 3 Runs once a week
OneOff 4 Only runs once ⚠ This frequency has not been implemented in the BasicEngine

Start Policy

A start policy defines under what conditions a slot should be run. This logic is implemented within the engine, and is otherwise unconfigurable.

Type Database Value Description Notes
AfterSuccess 0 Runs when every parent command has reported a successful run, or straight away if no parents
AfterSuccessAndReruns 1 Runs when every parent command has reported a successful run, and again whenever any parent commands has had a successful run after the initial run, or straight away if no parents ⚠ Not implemented
AfterFailure 2 Runs when every parent command has reported a successful run, or straight away if no parents ⚠ Not implemented
AfterFirstSuccess 3 Runs the first time any parent command reports a successful run, and never again ⚠ Not implemented
AfterFirstFailure 4 Runs the first time any parent command reports a successful run, and never again ⚠ Not implemented

Warning

Only AfterSuccess is actually implemented, so is the only one used. The others are examples of start policies that could be, but other more complicated ones could be implemented. (Only run after previous day has finished, etc)

Retry Policy

A retry policy defines under what conditions a slot should be retried. This logic is implemented within the engine, and is otherwise unconfigurable.

Type Database Value Description Notes
Standard 0 Retries a slot up to 10 times, separated by 10 minute intervals
StandardExtended 1 Retries a slot up to 20 times, separated by 10 minute intervals ⚠ Not implemented

Warning

Only Standard is actually implemented, so is the only one used.

Examples of other retry policies might be:

  • Retry using an exponential backoff rather than a 10 minute delay
  • Retry continuously up to a time period (1 hour, 1 day) rather than for a fixed number of times

Matrix

This defines how a slot should be repeated for a single definition, if at all. An example would be creating a slot for each CUBED account, for a single definition.

Type Database Value Description
Default 0 Slots should not be duplicated based on any parameters
Account 1 Slots should be duplicated once for every account

Keywords

There was intention for Keyword to be another matrix enum member. i.e. slots would be duplicated for every keyword stored within CUBED. Performance would likely have to be drastically improved to support this.