main.py¶

This is the file responsible for starting the servers and collecting the data.

init()¶

The pydcs program starts with Main.init being executed.

@click.command()
@click.option('--config_path', default='/etc/pydcs/config.yml', help='path to config file')
def init(config_path):

@click is a Python library responsible for creating nicer command line interfaces. You can read more about click here. In this instance when Main.py is run init() is called which may be passed the command line option of --config_path. This defaults to the file path /etc/pydcs/config.yml if no argument is given. --help may also be passed which will tell you the possible commands that can be executed. This will likely just be --config_path.

def init(config_path):
    config = yaml.load(open(config_path))

    # Setup colour logging
    logging_map = {
        'critical': logging.CRITICAL,
        'error':    logging.ERROR,
        'warning':  logging.WARNING,
        'info':     logging.INFO,
        'debug':    logging.DEBUG,
        'notset':   logging.NOTSET,
    }
    logging_level = config['server']['logging']
    pydcs.logger.setup(logging_map[logging_level])  

    logging.info('''
        {}
        # Starting application
          - main_workers = {}
          - retry_workers = {}
          - main_buffer_size = {}
          - retry_buffer_size = {}
          - machine_code = {}'''.format(
                open('ident').read(), 
                config['server']['main_workers'],
                config['server']['retry_workers'],
                config['server']['main_buffer_size'],
                config['server']['retry_buffer_size'],
                config['server']['machine_code']))

Once run a config variable is set to the contents of the file contained in the filepath specified either by the commandline argument passed in or the default file path defined in @click.option(). Then a logging_map object is created which maps various logging functions to pydcs.logger.setup. To learn more about the logger visit Logger. After logging is all set up, the config settings are logged to the user.

def init(config_path):
    # Setup web servers
    wsgi_server = WSGIServer(('', 8000), app)

    # Setup app server
    server = Server(config)
    server.metrics.request = metrics

    # Setup g.server binding
    @app.before_request
    def before_request():
        g.server = server

    gevent.joinall([
        gevent.spawn(wsgi_server.serve_forever),
        gevent.spawn(server.start)
        ])

Once the config settings are logged the WSGIWebserver is built which takes the port the project is being served on and the app routes. A Server object is also created with the config object. The variable metrics is also given to the servers metric property, this variable is actually an instance of MetricGroup. To learn more visit MetricGroup.

@app.before_request is a native flask function which registers a function to run before each request. To learn more read the flask Docs. The g object is a simple namespace object that has the same lifetime as an application context. Simply, g is a place to store data during a request. To learn more about g read the Flask Docs for g.

Finally gevent will schedule the execution of both events in the list one after the other. To learn more about gevent read the gevent Docs

handle()¶

Handle is called either directly in routing functions such as in report_email() or indirectly by being called within handle_json such as in our main routing function report3().

def handle(input_data, allow_payload=False, cost_in_pence=False):

The function takes one mandatory argument of input_data which should look similar to

{u'redirect': u'', u'impression': {}, u'endpoint': None, u'sid': u'', u'vid': u'', u'referrer': u'', u'labels': [], u'syncs': [], u'payload': None, u'pageUrl': u'', u'full': True, u'ip': u'62.255.101.34', u'aid': u'client1', u'events': []}

The entire function is wrapped in a try-except which will try to create a visit schema binding the request, server and account information set in g before the request was made, as well as whether the cost is in pence which is set by a kwarg. If the buffer queue is full then an exception will be raised.

try:        
    schema = VisitSchema().bind(
        request=request,
        server=g.server,
        accounts=g.server.accounts,
        cost_in_pence=cost_in_pence)
    gevent.sleep(0)
    ...

except gevent.queue.Full:
        logging.exception("Queue is full")
        #return {'status': False, 'message': 'Queue is full'}
        return False, None, 'Queue is full', {}

The rest of the handle() function will deserialize the data to create a Visit object. Once a visit is created it will check the cookies from the request in order to set the visitor and session token. If the user requests a third party response then account and visitor token information is added to request.cookie. In addition to this, a visit cookie is created and appended to visit.cookies.