Microservices Monitoring Architecture

This is a concrete build guide for a microservices monitoring architecture on Linux, using a stack that fits the monitoring ecosystem we used earlier:

Prometheus (metrics storage and querying)
Alertmanager (alert routing and grouping)
Grafana (dashboards)
Node Exporter (Linux host metrics)
cAdvisor (container level CPU, memory, network, filesystem)
Blackbox Exporter (synthetic probes: HTTP, TCP, ICMP)
OpenTelemetry Collector (optional but recommended) for traces and unified telemetry routing

If you run microservices on bare metal VMs with systemd services, you can skip cAdvisor. If you run containers, keep it.

This guide is written for tech-savvy readers but stays step-by-step.

Target architecture

On each Linux node that runs microservices

node_exporter on :9100
cAdvisor on :8080 (containers only)
optional OpenTelemetry Collector on :4317/:4318 if you will do tracing later

Central monitoring host

Prometheus on :9090
Alertmanager on :9093
Grafana on :3000
Blackbox exporter on :9115

Prometheus scrapes exporters and evaluates alerts. Grafana queries Prometheus. Alertmanager sends alerts.

0) Prereqs

One monitoring server (VM is fine) running Linux
Linux microservices nodes reachable from monitoring server
Firewall rules:
- monitoring host can reach nodes on 9100 and 8080 and any blackbox probe targets
- your admin IP can reach Grafana 3000, Prometheus 9090, Alertmanager 9093

1) Install Node Exporter on every Linux microservices node

1.1 Create user and install binary

sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter || true
cd /tmp
# copy the tarball to the server, then:
tar -xzf node_exporter-*.linux-amd64.tar.gz
sudo cp node_exporter-*/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

1.2 Systemd service

Create /etc/systemd/system/node_exporter.service:

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=":9100" \
  --collector.systemd \
  --collector.processes
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Enable and start

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
curl -s http://127.0.0.1:9100/metrics | head

This covers CPU, memory, disk, and network at host level.

2) If you run containers: install cAdvisor on each node

cAdvisor exposes container CPU, memory, network, and filesystem metrics.

2.1 Run cAdvisor with Docker

docker run -d --name=cadvisor --restart=always \
  --privileged \
  -p 8080:8080 \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:rw \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  gcr.io/cadvisor/cadvisor:latest

Verify

curl -s http://127.0.0.1:8080/metrics | head

If you do not use Docker, you can run cAdvisor under containerd as well, but Docker is the simplest baseline.

3) Install Prometheus on the monitoring host

3.1 Create user and dirs

sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus || true
sudo mkdir -p /etc/prometheus /var/lib/prometheus /etc/prometheus/rules
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

3.2 Install binaries

cd /tmp
tar -xzf prometheus-*.linux-amd64.tar.gz
sudo cp prometheus-*/prometheus prometheus-*/promtool /usr/local/bin/
sudo cp -r prometheus-*/consoles prometheus-*/console_libraries /etc/prometheus/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo chown -R prometheus:prometheus /etc/prometheus

3.3 Prometheus config with jobs for node_exporter and cAdvisor

Create /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules/*.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["127.0.0.1:9093"]

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets:
          - "srv-01:9100"
          - "srv-02:9100"
  - job_name: "cadvisor"
    static_configs:
      - targets:
          - "srv-01:8080"
          - "srv-02:8080"

3.4 Systemd service

Create /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.listen-address=":9090"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Start

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

Check targets: http://monitoring-01:9090/targets

4) Install Alertmanager on the monitoring host

4.1 Install

sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager || true
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
cd /tmp
tar -xzf alertmanager-*.linux-amd64.tar.gz
sudo cp alertmanager-*/alertmanager alertmanager-*/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool

4.2 Basic config

Create /etc/alertmanager/alertmanager.yml:

global:
  resolve_timeout: 5m

route:
  group_by: ["alertname", "instance", "job"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: "default"

receivers:
  - name: "default"
    email_configs:
      - to: "[email protected]"
        from: "[email protected]"
        smarthost: "smtp.example.com:587"
        auth_username: "[email protected]"
        auth_password: "REPLACE_ME"
        send_resolved: true

4.3 Systemd service

Create /etc/systemd/system/alertmanager.service:

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager \
  --web.listen-address=":9093"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Start

sudo systemctl daemon-reload
sudo systemctl enable --now alertmanager

5) Add microservices-ready alert rules

Create /etc/prometheus/rules/microservices-core.yml:

groups:
- name: microservices-core
  rules:

  # Host CPU saturation
  - alert: HostHighCPU
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage > 90% for 10m."

  # Host memory pressure indicator
  - alert: HostLowMemAvailable
    expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.10
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "Low MemAvailable on {{ $labels.instance }}"
      description: "MemAvailable < 10% for 10m."

  # Disk space low
  - alert: HostDiskSpaceLow
    expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.10
    for: 15m
    labels: {severity: warning}
    annotations:
      summary: "Disk space low on {{ $labels.instance }}"
      description: "Filesystem free < 10% for 15m."

  # Disk busy time is consistently high (contention signal)
  - alert: HostDiskBusy
    expr: rate(node_disk_io_time_seconds_total[5m]) > 0.6
    for: 15m
    labels: {severity: warning}
    annotations:
      summary: "Disk device busy on {{ $labels.instance }}"
      description: "Device busy > 60% of time for 15m."

  # Network errors and drops
  - alert: HostNetworkRxErrors
    expr: rate(node_network_receive_errs_total[5m]) > 0
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "Network RX errors on {{ $labels.instance }}"
      description: "Receive errors increasing."

  - alert: HostNetworkRxDrops
    expr: rate(node_network_receive_drop_total[5m]) > 0
    for: 10m
    labels: {severity: warning}
    annotations:
      summary: "Network RX drops on {{ $labels.instance }}"
      description: "Receive drops increasing."

Validate and reload:

sudo promtool check rules /etc/prometheus/rules/microservices-core.yml
sudo systemctl reload prometheus || sudo systemctl restart prometheus

This gives you sane baseline alerting for the infrastructure layer.

6) Add synthetic monitoring for service endpoints with Blackbox Exporter

Microservices issues are frequently dependency or network path related. Add synthetic probes.

6.1 Install and run Blackbox Exporter

On monitoring host:

sudo useradd --no-create-home --shell /usr/sbin/nologin blackbox || true
sudo mkdir -p /etc/blackbox_exporter
sudo chown -R blackbox:blackbox /etc/blackbox_exporter
cd /tmp
tar -xzf blackbox_exporter-*.linux-amd64.tar.gz
sudo cp blackbox_exporter-*/blackbox_exporter /usr/local/bin/
sudo chown blackbox:blackbox /usr/local/bin/blackbox_exporter

Create /etc/blackbox_exporter/blackbox.yml:

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      preferred_ip_protocol: "ip4"
      valid_http_versions: ["HTTP/1.1","HTTP/2"]
  tcp_connect:
    prober: tcp
    timeout: 5s
  icmp:
    prober: icmp
    timeout: 5s

Systemd service /etc/systemd/system/blackbox_exporter.service:

[Unit]
Description=Blackbox Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox
Group=blackbox
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter --config.file=/etc/blackbox_exporter/blackbox.yml --web.listen-address=":9115"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Start:

sudo systemctl daemon-reload
sudo systemctl enable --now blackbox_exporter

6.2 Configure Prometheus to scrape Blackbox probes

Add to /etc/prometheus/prometheus.yml under scrape_configs:

- job_name: "blackbox-http"
    metrics_path: /probe
    params:
      module: [http_2xx]

    static_configs:
      - targets:
          - https://service-a.example.com/health
          - https://service-b.example.com/health

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115

Reload Prometheus.

This gives you:

availability checks
latency at the edge
a clean signal that maps to user experience

7) Install Grafana and build a minimal dashboard set

Install Grafana on monitoring host, then:

7.1 Add Prometheus data source

URL: http://localhost:9090

7.2 Build three dashboards (keep it simple)

Dashboard A: Host overview

CPU usage by mode
Load average
MemAvailable ratio
Disk free per filesystem
Network bytes in/out
Network drops/errors

Dashboard B: Container overview (if using cAdvisor)

Container CPU usage
Container memory usage and limits
Container restarts (if you also scrape kube metrics later)
Container network throughput

Dashboard C: Synthetic checks

Probe success
Probe duration
Probe DNS and connect timings (blackbox provides these)

Target architecture

On each Linux node that runs microservices

Central monitoring host

0) Prereqs

1) Install Node Exporter on every Linux microservices node

1.1 Create user and install binary

1.2 Systemd service

2) If you run containers: install cAdvisor on each node

2.1 Run cAdvisor with Docker

3) Install Prometheus on the monitoring host

3.1 Create user and dirs

3.2 Install binaries

3.3 Prometheus config with jobs for node_exporter and cAdvisor

3.4 Systemd service

4) Install Alertmanager on the monitoring host

4.1 Install

4.2 Basic config

4.3 Systemd service

5) Add microservices-ready alert rules

6) Add synthetic monitoring for service endpoints with Blackbox Exporter

6.1 Install and run Blackbox Exporter

6.2 Configure Prometheus to scrape Blackbox probes

7) Install Grafana and build a minimal dashboard set

7.1 Add Prometheus data source

7.2 Build three dashboards (keep it simple)

Ivan Dabić

Ivan Dabić

Scale Confidently with BlueGrid.io

Microservices Monitoring Architecture

Target architecture

On each Linux node that runs microservices

Central monitoring host

0) Prereqs

1) Install Node Exporter on every Linux microservices node

1.1 Create user and install binary

1.2 Systemd service

2) If you run containers: install cAdvisor on each node

2.1 Run cAdvisor with Docker

3) Install Prometheus on the monitoring host

3.1 Create user and dirs

3.2 Install binaries

3.3 Prometheus config with jobs for node_exporter and cAdvisor

3.4 Systemd service

4) Install Alertmanager on the monitoring host

4.1 Install

4.2 Basic config

4.3 Systemd service

5) Add microservices-ready alert rules

6) Add synthetic monitoring for service endpoints with Blackbox Exporter

6.1 Install and run Blackbox Exporter

6.2 Configure Prometheus to scrape Blackbox probes

7) Install Grafana and build a minimal dashboard set

7.1 Add Prometheus data source

7.2 Build three dashboards (keep it simple)

Ivan Dabić

Ivan Dabić

Scale Confidently with BlueGrid.io

Subscribe to our blog

Confirm Your Email Address