Skip to content

Overview

We use AWS to host some of our infrastructure. AWS is a cloud computing platform that provides a wide range of services, including computing power, storage, and databases. We use AWS for domain registration, EC2 instances and load balancers. It should be noted that all of our AWS infrastructure lies within one region, Europe (Ireland) or eu-west-1. Below is a simplified diagram of our AWS infrastructure.

AWS

Note

Most of our infrastructure: Control, Dash, Data Collection Service, Data Processing etc... is hosted by Positive Internet.

VPC

We use VPC to create a virtual network that closely resembles a traditional network that you might operate in your own data center. We use VPC to launch AWS resources into a virtual network that we've defined. A VPC requires a subnet, route table, and security group to function.

Subnets

A subnet is a range of IP addresses in your VPC. We use subnets to divide the VPC IP address range into multiple sub-ranges. Each subnet must reside entirely within one Availability Zone and cannot span zones. We use subnets to control the traffic flow between resources and the internet. Subnets can be public or private. Public subnets have a route to the internet gateway, while private subnets do not.

Route Tables

Route tables are used to determine where network traffic is directed. We use route tables to control the routing of network traffic between subnets and the internet. Each subnet in a VPC must be associated with a route table. We use route tables to control the traffic flow between resources and the internet.

Site-to-Site VPN

We have set up a Site-to-Site VPN connection between our VPC and the Positive Internet LAN. This allows Positive Internet to access our resources in the VPC, and vice versa. The VPN connection uses IPsec to secure the connection between the customer (Positive) gateway and the virtual private gateway. The customer gateway is the device on the customer side of the Site-to-Site VPN connection. The virtual private gateway is the VPN concentrator on the AWS side of the Site-to-Site VPN connection.

Note

Without this VPN connection, Positive Internet would not be able to access our resources in the VPC, and we would not be able to access their resources on their LAN. This would be problematic as Positive Internet hosts our databases (amongst many other things), meaning that our EC2 instances would not be able to access the databases without the VPN connection.

Domain Registration

We use AWS to register our domains. The process is straightforward and can be done through the AWS Management Console. The domains are then managed through the Route 53 service. The table below shows the main domains we have registered through AWS.

Domain Description
withcubed.com Public domain for Cubed
withcubed.internal Internal domain for Cubed resources
withcubed.local Internal domain for Cubed resources inside the VPC
cubed.ai Public domain for Cubed AI our marketing website

What is Route 53?

Route 53 is a scalable and highly available Domain Name System (DNS) web service. It is designed to give developers and businesses an extremely reliable and cost-effective way to route end users to Internet applications by translating human-readable names like www.example.com into the numeric IP addresses associated with servers.

Load Balancers

We use Elastic Load Balancing to distribute incoming application or network traffic across multiple targets, such as EC2 instances, containers, and IP addresses. Elastic Load Balancing automatically scales its request handling capacity in response to incoming traffic. Our load balancers include:

Load balancer Description Target(s) HTTPS redirect
Positive-DCS Distributes traffic to our data collection service (DCS) hosted by Positive Internet. 10.3.61.1 No
Positive-Prod-Dash-LB This load balancer is used to distribute our production dashboard traffic hosted by Positive Internet. 10.3.62.1 Yes
uat-load-balancer Allows us to give SSL certificates (via AWS Certificate Manager) to our UAT EC2 instances. 10.3.63.17, 10.3.63.18 Yes

EC2 Instances

We use EC2 instances to host some of our services. EC2 is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. Some use cases within Cubed include: hosting our machine learning models, user accepting testing, our tag simulation, these internal docs, and a DNS server. Our main EC2 instances include:

Instance Description
page_attribution (Page Attribution) Built from an AMI, this model trains on the latest customer journey data and updates the page attribution model files inside the S3 bucket.
model_trainer (Customer Attribution) Built from an AMI, this model trains on the latest customer journey data and updates the customer attribution model files inside the S3 bucket.

Debugging EC2 Instances

  • If you are having trouble connecting to an EC2 instance, you can check the instance's system log. This log can provide information about the instance's startup process. To view the system log, open the Amazon EC2 console at https://console.aws.amazon.com/ec2/, choose Instances in the navigation pane, select the instance, and then choose Actions > Monitor and troubleshoot > Get system log.
  • You can also SSH into the instance for more detailed debugging. To SSH into an instance, you will need the instance's public IP address and the private key associated with the instance. The command to SSH into an instance is as follows: ssh -i /path/to/aws_private_key ubuntu@public-ip-address. Ask your manager for the private key if you do not have it. Please ensure your IP is whitelisted in the security group for the instance you are trying to connect to.

SQS Instances

Upon account creation, each client is assigned an SQS instance. This decouples the data collection service from the data processing service. The data collection service writes some visitor data to the SQS instance, and the data processing service reads the data from the SQS instance. The DCS writes visitor data to the SQS instance, and the DPS reads the data from the SQS instance. The DPS processes the data and writes the results to the database.

Lambda Functions

We use Lambda functions to offer real time model predictions for our customer attribution model. The Lambda function is triggered by an API Gateway request, and the function returns the model prediction. The Lambda function is built using Python. See more about how we use Lambda functions here.

Security Groups

We use security groups to control the traffic to our EC2 instances. A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. When you launch an instance in a VPC, you can assign up to five security groups to the instance. Security groups act at the instance level, not the subnet level. Therefore, each instance in a subnet in your VPC could be assigned to a different set of security groups. Some of our main security groups include:

Security group Description
VisScore Internal Web Servers Allows traffic to our internal web servers.
SSH LIVE Web Servers Allows SSH access to our live web servers, all of our SSH connections belong here.
bitbucket-pipelines Allows Bitbucket Pipelines to connect to our EC2 instances and other infrastructure.
VisScore Docs Server Allows traffic to our internal documentation server.
VisScore DNS Server Allows traffic to our internal DNS server.

Secrets Manager

We use Secrets Manager to store and manage our secrets. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and on-going maintenance costs of operating your own infrastructure. Secrets Manager enables you to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle.

Specifically, we use Secrets Manager to store database credentials, service login credentials (Grafana, InfluxDB etc.), server credentials, and more.