Overview
All functions are static methods on the Segmenter Rules class. They all require the same params to be passed in, and must return bool True/False.
A function should be designed to know exactly what it is expecting to recieve from the "function" parameter and how to interpret it. The function object will hold user configured variables including operator and value.
@staticmethod
def custom_function(worker, conn, visit, function, *args, **kwargs):
return True or False
Params¶
Here is a break down of the params passed in:
Name | Type | Description |
---|---|---|
worker | BaseWorker | Object representing the Worker class that has called this function. Worker object holds a lot of functions relative to the current thread and connection object including SQL getting/dictionary converting. |
conn | Umysql.Connection | Umysql object to make DB queries. |
visit | SegmenterVisit | The Visit class specific for Segmenter. This holds basic information to get Visitor/Visit specific information. By this point the class has both visit_id as .id and visitor_id as .visitor_id. |
function | AccountFunction | This object holds the relevant information for this specific function as configured by a cubed customer in our db. |
Worker¶
For more information about the Worker class please see it's helper functions
SegmenterVisit¶
def __init__(self):
self.id = None
self.visitor_id = None
self.visitor_token = None
self.session_token = ""
self.created = arrow.utcnow()
Name | Type | Description |
---|---|---|
id | Int | This is the PK Id from attrib_visit |
visitor_id | Int | This is the PK Id from attrib_visitor |
visitor_token | String | This is the token from attrib_visitor |
session_token | String | This is the token from attrib_visit |
You should only need id and visitor_id to get any data needed for your function to work. The tokens are provided if needed.
AccountFunction¶
def __init__(self):
self.data_type = None
self.category = None
self.value = None
self.operator = None
self.has = None
Name | Type | Description |
---|---|---|
data_type | String | Holds the data type this function should interpret its self as. Mostly used for the Dashboard frontend but might be needed at this processing level. |
category | String | This tells the function if it is at the Visit or Visitor level. This is set from the frontend but may be needed during processing if the function should operate at either level specifically. |
value | String | All values are stored as String to make it easier. You can use the data_type if you are unsure of how it should be interpreted. Though a function by it's design should know exactly what it is being passed and behave very specifically. |
operator | String | Please see operators for further information. |
has | Bool | This is set in the frontend and is used here to calculate if the function should treat its self as opposite. Similar to returning !Success. |
Example¶
time_since_last_visit¶
This function will find the difference between a Visitor's current visit and compare against their previous. The settings allowed in the frontend will include "==, >, <" etc.. as it is just comparing a number. We store time based data as seconds where possible.
So when function is passed in, we can expect the following:
function.operator will be something like ">"
function.value will be an Int of seconds like "172800"
If we were to recieve these 2 values, we're saying in our code:
"if current visitor's visit is greater than previous visit by 172800 seconds, return True".
@staticmethod
def time_since_last_visit(worker, conn, visit, function, *args, **kwargs):
'''
Calculate if the time since this visitor's current visit
and previous visit is within the time set
'''
query = '''
select a.first_visit, a.last_visit from attrib_visit a
where a.visitor_id = %s
order by a.first_visit desc
limit 2
'''
rows = worker.find_single_full_row(conn, query, (visit.visitor_id, ))
if len(rows) < 2:
return False
recent = arrow.get(rows[0])
previous = arrow.get(rows[1])
seconds = (recent - previous).seconds
condition = prepare_expression(seconds, function)
result = eval(condition)
logging.debug('time_since_last_visit: {}\n:{}'.format(condition, result))
return result
Quickly break down this simple function into steps:
- Get relevant specific data needed for this calculation only.
- Use the helper function on the Worker class to extra all rows returned.
- Convert the values into arrow objects for easy DateTime calculations.
- Get the seconds difference.
- Use the helper function to translate the value we've calculated from the DB and the function object holding operator and value set by customer.
- Pass the prepared expression into python's eval() and return that bool result.
Note
If the "has" variable was set to False, we would return the OPPOSITE of what the function here is doing. Meaning we could do:
python if not function.has return !result
But this is handled inside the prepare_expression.
Tests¶
Using pytest
we run tests against all of our functions within rules.py
, to run a test, simply navigate into cd /srv/pydps/tests/segmenter
and run pytest *
to run checks on all functions or pytest {file}.py
to check a specific file.
All test functions can be located in tests/segmenter/functions/
and are named after the function they test - prefixed with test_
. When creating the test, you will need to utilise the visit-generator
package to generate a visit that is stored in your database connection, this allows us to create specific conditions when testing our functions. Additionally, helpers.py
contains many helper classes with generic paths, events, and products that can be used to generate the aforementioned visits.
Note
When creating your condition object please ensure you use the correct Condition<Type>
class found within condition_builder.py
, else this can lead to incorrect data-type failures when testing your function(s).