Configure → Data Collection → Prometheus

These instructions only apply to resources that will use Prometheus software. Configuration instructions for PCP are here.

This section gives example configuration settings for Prometheus running on the compute nodes of an HPC cluster. These configuration guidelines are based on the Prometheus collection setup at CCR Buffalo.

Prerequisites

The recommended exporters should be installed on every compute node as described in the install section.

Configuration

After the exporters have been installed on each compute node, Prometheus should be configured to scrape the endpoints provided by the exporters. The following basic Prometheus configuration is the recommended configuration for use with the summarization software.

prometheus.yml

global:
  scrape_interval: 30s
  scrape_timeout: 30s

scrape_configs:
  - file_sd_configs:
    - files:
      - "/etc/prometheus/file_sd/targets.json"
    job_name: compute
    relabel_configs:
    - regex: ([^.]+)..*
      replacement: $1
      source_labels:
      - __address__
      target_label: host

Scrape Interval

The scrape_interval configuration sets the frequency at which Prometheus scrapes metrics exposed by the exporters. It is recommended for Prometheus to scrape exporters every 30 seconds, but this can vary depending on the number of nodes being monitored and storage limitations.

File-Based Service Discovery

Prometheus can be configured to automatically monitor or stop monitoring nodes as they become available or unavailable. This is managed by the file_sd_configs configuration. This configuration allows Prometheus to dynamically scrape nodes as they become available without having to restart the Prometheus server. Prometheus listens for changes to the files configured under files and automatically scrapes those configured targets. An example targets.json is below:

targets.json

[
  {
    "targets": [
      "cpn-01:9100",
      "cpn-01:9306",
      "cpn-02:9100",
      "cpn-02:9306",
      ...
    ],
    "labels": {
      "cluster": "resource_name",
      "environment": "production",
      "role": "compute"
    }
  }
]

One advantage to using file-based service discovery is that Prometheus can be configured to add pre-defined labels to metrics scraped from groups of targets. This can be set up across multiple clusters or environments like in the example. More information about Prometheus’s file-based service discovery can be found here.

Relabeling

Prometheus can edit labels attached to metrics as they are scraped. The relabeling configured under relabel_configs in prometheus.yml above converts the fqdn returned by the default __address__ label into just the hostname labeled as “host”.

NOTE: The summarization software’s default prometheus/mapping.json expects the “__address__” to be relabeled as “host”. The name of the target relabel must match the name defined in the “params” section of the prometheus mapping file.