Linux Monitoring Stack How-To: Prometheus, Node Exporter, Alertmanager, Grafana

This guide sets up a practical monitoring ecosystem for CPU, memory, disk I/O, and network on Linux servers using:

Prometheus for metrics collection and storage
Node Exporter on each Linux server for OS and hardware metrics
Grafana for dashboards
Alertmanager for alert routing and grouping

You can run everything on one monitoring VM or split it later. Steps below assume Ubuntu/Debian, but the same approach works on most Linux distros with minor package and service path differences.

Architecture

Each Linux server runs node_exporter (pull model)
A central Prometheus scrapes node_exporter endpoints
Prometheus evaluates alert rules
Alertmanager receives alerts and sends notifications (email, Slack, PagerDuty, OpsGenie, etc.)
Grafana queries Prometheus and renders dashboards

0) Prereqs and conventions

Monitoring host: monitoring-01
Monitored servers: srv-01, srv-02, etc.
Ports:
- Node Exporter: 9100
- Prometheus: 9090
- Alertmanager: 9093
- Grafana: 3000

Firewall minimum:

Allow monitoring-01 to reach srv-* on TCP 9100
Allow your admin IPs to reach monitoring-01 on 9090, 9093, 3000

Create a dedicated user on the monitoring host (recommended).

1) Install Node Exporter on each Linux server

1.1 Create a user and directories

sudo useradd --no-create-home --shell /usr/sbin/nologin node_exporter || true
sudo mkdir -p /opt/node_exporter
cd /opt/node_exporter

1.2 Download and install Node Exporter

Pick the latest Linux amd64 release you trust and copy it to the server. Example:

sudo tar -xzf node_exporter-*.linux-amd64.tar.gz
sudo cp node_exporter-*.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

1.3 Systemd service

Create /etc/systemd/system/node_exporter.service:

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
  --web.listen-address=":9100" \
  --collector.systemd \
  --collector.processes
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
sudo systemctl status node_exporter --no-pager

Verify locally:

curl -s http://127.0.0.1:9100/metrics | head

That is enough to expose:

CPU: usage by mode, load averages, context switches
Memory: MemAvailable, swap, paging stats
Disk: filesystem space, diskstats (I/O), iowait context via CPU metrics
Network: interface bytes, errors, drops, TCP stats

2) Install Prometheus on the monitoring host

2.1 Create user and directories

sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus || true
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus

2.2 Download Prometheus and install binaries

cd /tmp
sudo tar -xzf prometheus-*.linux-amd64.tar.gz
sudo cp prometheus-*/prometheus /usr/local/bin/
sudo cp prometheus-*/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
sudo cp -r prometheus-*/consoles prometheus-*/console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus

2.3 Prometheus config

Create /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - /etc/prometheus/rules/*.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["127.0.0.1:9093"]

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets:
          - "srv-01:9100"
          - "srv-02:9100"
        labels:
          env: "prod"

Create rules directory:

sudo mkdir -p /etc/prometheus/rules
sudo chown -R prometheus:prometheus /etc/prometheus/rules

2.4 Systemd service for Prometheus

Create /etc/systemd/system/prometheus.service:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.listen-address=":9090"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Start:

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus --no-pager

Verify:

Open Prometheus UI on http://monitoring-01:9090
Check Status -> Targets and confirm nodes are UP

3) Add alerting with Alertmanager

3.1 Install Alertmanager

sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager || true
sudo mkdir -p /etc/alertmanager /var/lib/alertmanager
sudo chown -R alertmanager:alertmanager /etc/alertmanager /var/lib/alertmanager
cd /tmp
sudo tar -xzf alertmanager-*.linux-amd64.tar.gz
sudo cp alertmanager-*/alertmanager /usr/local/bin/
sudo cp alertmanager-*/amtool /usr/local/bin/
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager /usr/local/bin/amtool

3.2 Alertmanager config

Create /etc/alertmanager/alertmanager.yml (example routes to email, replace with your receiver of choice):

global:
  resolve_timeout: 5m

route:
  group_by: ["alertname", "instance", "job"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: "default"

receivers:
  - name: "default"
    email_configs:
      - to: "[email protected]"
        from: "[email protected]"
        smarthost: "smtp.example.com:587"
        auth_username: "[email protected]"
        auth_password: "REPLACE_ME"
        send_resolved: true

3.3 Systemd service

Create /etc/systemd/system/alertmanager.service:

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/var/lib/alertmanager \
  --web.listen-address=":9093"
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Start:

sudo systemctl daemon-reload
sudo systemctl enable --now alertmanager
sudo systemctl status alertmanager --no-pager

4) Create alert rules for CPU, memory, disk I/O, network

Create /etc/prometheus/rules/linux-core.yml:

groups:
- name: linux-core
  rules:

  # CPU: sustained high usage (ignores iowait as part of "usage" here)
  - alert: HostHighCPU
    expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage > 90% for 10m."

  # CPU: high iowait indicates storage bottleneck symptoms
  - alert: HostHighIOWait
    expr: (avg by(instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100) > 10
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "High CPU iowait on {{ $labels.instance }}"
      description: "IO wait > 10% for 10m (often disk contention or slow storage)."

  # Memory: low MemAvailable
  - alert: HostLowMemoryAvailable
    expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.10
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Low available memory on {{ $labels.instance }}"
      description: "MemAvailable < 10% for 10m."

  # Swap: sustained swap in use and paging activity
  - alert: HostSwapInUse
    expr: (node_memory_SwapTotal_bytes - node_memory_SwapFree_bytes) / node_memory_SwapTotal_bytes > 0.25
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Swap usage high on {{ $labels.instance }}"
      description: "Swap usage > 25% for 15m."

  # Disk space: filesystem almost full (exclude tmpfs)
  - alert: HostDiskSpaceLow
    expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"}) < 0.10
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Disk space low on {{ $labels.instance }}"
      description: "Filesystem free < 10% for 15m."

  # Disk I/O: elevated disk time (busy) per device, rough contention signal
  - alert: HostDiskDeviceBusy
    expr: rate(node_disk_io_time_seconds_total[5m]) > 0.6
    for: 15m
    labels:
      severity: warning
    annotations:
      summary: "Disk device busy on {{ $labels.instance }}"
      description: "Disk io_time > 0.6s/s for 15m (device is busy most of the time)."

  # Network: interface receive errors or drops increasing
  - alert: HostNetworkRxErrors
    expr: rate(node_network_receive_errs_total[5m]) > 0
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Network receive errors on {{ $labels.instance }}"
      description: "Network RX errors increasing."

  - alert: HostNetworkRxDrops
    expr: rate(node_network_receive_drop_total[5m]) > 0
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Network receive drops on {{ $labels.instance }}"
      description: "Network RX drops increasing."

Validate and reload:

sudo promtool check config /etc/prometheus/prometheus.yml
sudo promtool check rules /etc/prometheus/rules/linux-core.yml
sudo systemctl reload prometheus || sudo systemctl restart prometheus

Notes for sane alerting:

Start with warning thresholds and longer for: durations
Alert on symptoms that require action, not on every spike
Add labels like team, service, env once you have ownership mapping

5) Install Grafana and build dashboards

Install Grafana from your distro repo or Grafana’s packages. Then:

Open Grafana UI on http://monitoring-01:3000
Add Prometheus data source:
- URL: http://localhost:9090
Import a Node Exporter dashboard:
- In Grafana, go to Dashboards -> Import
- Search for a Node Exporter dashboard in Grafana’s dashboard library
- Select your Prometheus data source

Recommended panels to keep front-and-center:

CPU usage by mode (user, system, iowait, idle)
Load average vs CPU cores
Memory: MemAvailable, cache, swap used, major page faults if available
Disk: filesystem free %, disk busy time, read/write throughput
Network: bytes in/out, drops, errors, retransmits, if you collect them

6) Disk I/O and network depth

Node Exporter covers most needs. Two practical additions are worth considering when you want more clarity:

A) Add network reachability and latency checks (optional)

Use Blackbox Exporter to probe:

ICMP ping latency and packet loss to critical endpoints
HTTP checks for service health

This helps with true network symptoms, not just interface counters.

B) Add per-service metrics later

For databases and apps, use their exporters once you have the base host layer stable. Do not start here. Start with the node metrics first.

7) What “good” looks like after setup

After you finish, you should be able to answer these quickly:

CPU: Is the box actually CPU-bound or just iowaiting
Memory: Is it real pressure or just cache usage
Disk: Is storage getting slower before filling up
Network: Are there drops/errors or rising latency between dependencies

If an incident happens, you can correlate:

latency and errors at the service level (later)
with CPU iowait, disk device busy time, memory pressure, and network drops

That is the core ecosystem.

Linux Monitoring Stack How-To: Prometheus, Node Exporter, Alertmanager, Grafana

Architecture

0) Prereqs and conventions

1) Install Node Exporter on each Linux server

1.1 Create a user and directories

1.2 Download and install Node Exporter

1.3 Systemd service

2) Install Prometheus on the monitoring host

2.1 Create user and directories

2.2 Download Prometheus and install binaries

2.3 Prometheus config

2.4 Systemd service for Prometheus

3) Add alerting with Alertmanager

3.1 Install Alertmanager

3.2 Alertmanager config

3.3 Systemd service

4) Create alert rules for CPU, memory, disk I/O, network

5) Install Grafana and build dashboards

6) Disk I/O and network depth

A) Add network reachability and latency checks (optional)

B) Add per-service metrics later

7) What “good” looks like after setup

Ivan Dabić

Ivan Dabić

Scale Confidently with BlueGrid.io

Linux Monitoring Stack How-To: Prometheus, Node Exporter, Alertmanager, Grafana

Architecture

0) Prereqs and conventions

1) Install Node Exporter on each Linux server

1.1 Create a user and directories

1.2 Download and install Node Exporter

1.3 Systemd service

2) Install Prometheus on the monitoring host

2.1 Create user and directories

2.2 Download Prometheus and install binaries

2.3 Prometheus config

2.4 Systemd service for Prometheus

3) Add alerting with Alertmanager

3.1 Install Alertmanager

3.2 Alertmanager config

3.3 Systemd service

4) Create alert rules for CPU, memory, disk I/O, network

5) Install Grafana and build dashboards

6) Disk I/O and network depth

A) Add network reachability and latency checks (optional)

B) Add per-service metrics later

7) What “good” looks like after setup

Ivan Dabić

Ivan Dabić

Scale Confidently with BlueGrid.io

Subscribe to our blog

Confirm Your Email Address