Skip to content

Overview

Introduction

The Data Processing Servers (DPS) are based on the Data Collection Server (DCS) code base. This way they follow the same concept: URL entry point that returns a success with extra information. Each service has its own schema validation class of how it should interpret the incoming request. Once this is sucessful the object is played into the buffer.
Same as the DCS, once the server is running it as a few functions that run every loop. These are used to setup any base configruations (Account class etc..). After these functions the worker class will start pulling objets from the buffer.

Structure

The project is structured in a way where each service/application inherits from the "base" app. The base app is called simply "dps", and holds a base class for any Cubed service to be built from. This includes a BaseServer - which is the main entry point for an application, and then some BaseWorker classes which the Server will use to handle items put into its buffer(s) (see below), and finally it has its own Accounts class which holds an array of cubed Accounts tracking whether they should be updated within a loop or not.
This all follows the same principals and designs as the DCS. To the point where my intention was the dps base applicaiton here would then have all Cubed services extend from it and add what ever they need. For example the DCS project could be moved over and called Collector.

Buffer Types

The cubed DPS offers 2 ways to handle queued data. The first is gevent.queue (which is also used in the DCS), and the second is AWS SQS (currently only used by the Visscore application).

Metrics

Metrics can be added per function within each of those processes, with a few specific exceptions1. This is done in the MetricGroup class, found in metrics.py. The metric we use is timing - this is a custom defined class that adds four metrics to each function - min, max, avg and count. These are hopefully self-explanatory, but for completeness min is the shortest time a function has taken to run, max is the longest, avg is the average time it takes the function to run over the lifetime of this PyDPS app, and count is the number of times the function has run, again over the lifetime of the app. Lifetimes will reset when a box is power-cycled for whatever reason (eg a deployment).

If you are looking to add another metric type this should be done in metrics.py by adding your own class, then defining it in MetricGroup.

To apply metrics to a given function it must be given the decorator @metrics.\[function name].time. For example, the metric pydps.worker.update_visitor_predictions.timing.count gives us the count from the worker function update_visitor_predictions.