Server Layout¶
Each server has roughly the same layout:
SystemD services¶
We use systemd to manage all of our services, so they are restarted in event of a crash or system restart.
Here are the common services you might find running on the VMs:
Service Name | Description | Config Path |
---|---|---|
backend | Django/Dashboard | /etc/systemd/system/backend.service |
pydcs | DCS | /etc/systemd/system/pydcs.service |
pydps | DPS | /etc/systemd/system/pydps.service |
promtail | Promtail agent | /etc/systemd/system/promtail.service |
telegraf | Telegraf agent | /etc/systemd/system/promtail.service |
Promtail¶
Promtail is an agent which ships the contents of local logs (e.g. /var/log/attrib/backend.error.log
etc) to a Grafana Loki instance.
Configuration¶
Promtails configuration is stored at /etc/promtail/config.yaml
. An example might look like:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /data/promtail/positions.yaml
clients:
- url: http://loki.cubed.internal:3100/loki/api/v1/push
scrape_configs:
- job_name: dcs
static_configs:
- targets:
- localhost
labels:
hostname: cubed-stge-dcs-a
environment: stge
service: dcs
__path__: /var/log/pydcs/*.log
An interesting part to note is this is where the labels are applied for which you can later filter by in Loki/Grafana. (using a query such as {service="dcs", environment="stge"}
)
Telegraf¶
Telegraf is an agent which ships metrics such as CPU, Memory etc as well as custom metrics to InfluxDB. Most metrics used by the DCS dashboards in Grafana are captured using Telegraf and the DCS /@server
endpoints.
Telegraf is configured remotely, using InfluxDB's web interface.
NFS Mounts¶
The code for each project is not stored on the server directly, but rather instead is mounted using NFS to a central location (usually cubed-nfs-a/b
or cubed-stge-nfs
). See the utility servers documentation on NFS mounts for more information.