Configure → Job Performance module
This guide explains how to configure the Job Performance (SUPReMM) XDMoD module.
Prerequisites
Ensure that Open XDMoD is installed and configured correctly and the shredder and ingestor scripts have been run successfully before configuring the SUPReMM module. Do not begin the configuration steps until the accounting data is loaded into XDMoD.
Double check that that the timezone
and shared_jobs
configuration settings
in the resources.json
configuration file are correct for the resources that have performance data. These settings are
documented in the main configuration guide
The timezone
setting determines the timezone of time data displayed in the XDMoD Job Viewer tab. The
shared_jobs
setting determines whether the accounting data will be processed to check
for jobs that share compute nodes.
Configuration
The Job Performance (SUPReMM) XDMoD module adds an additional main menu item to the XDMoD interactive setup software. Run the script as follows:
# xdmod-setup
and select the ‘SUPReMM’ option in the main menu. The xdmod-setup
script usage is described below.
The next step after configuring in the XDMoD module is to install and configure the job summarization software.
The xdmod-setup
script
xdmod-setup
is an interactive setup script for XDMoD. After installing the xdmod-supremm module, there will be an
additional option titled “SUPReMM” in the main menu. Select that option to show the SUPReMM
module configuration menu. The options in the menu are listed below:
Setup database
This option creates the necessary SUPReMM-module specific database schemas
and tables in the XDMoD datawarehouse. You will
need to provide the credentials for your MySQL root user, or another
user that has privileges to create databases. Two database schemas will be
created, modw_etl
and modw_supremm
. The database user that is
specified in your portal_settings.ini
will be granted access to these
databases.
The script also prompts for the location of the document database that contains the job summary data. I.e. the MongoDB instance. Enter the uri in the standard MongoDB connection string format (see the mongo documentation for the syntax). You must specify the database name in the connection URI. If the database is not specified then the MongoDB driver defaults to the ‘test’ database, which will not contain the job summary data. The default database name is ‘supremm’ so, for example, if you have installed the MongoDB on the same server as XDMoD then you would use the following uri:
mongodb://localhost:27017/supremm
The script also runs the acl-config
command that is used to update the access controls in
XDMoD. If you prefer to run this command manually use the following command
# acl-config
The acl-config
command is documented in the XDMoD command reference.
Configure resources
The setup script automatically detects the existing resources in the XDMoD datawarehouse and lists them. If no “Edit resource” options show in the list then quit the setup and complete the steps listed in the shredder and ingestor guides before re-running the setup script.
By default all the resources are disabled. You must select the “Edit resource” option for each resource that you wish to configure to appear in the SUPReMM realm and follow the prompt to enable the resource and set the correct options. The “Dataset mapping” should be set to ‘pcp’ if processing job summaries generated from PCP data and ‘prometheus’ if processing job summaries generated from Prometheus data.
SUPReMM configuration files
The SUPReMM module configuration files are located in the etc
directory of
the installation prefix or /etc/xdmod
for the RPM distribution.
supremm_resources.json
Defines all of the resources that have Job Performance data that will be ingested and
displayed in XDMoD. Each object in the array represents the configuration for a
single resource. All resources listed in this file must also have entries in
the resources.json
and resource_specs.json
main configuration files
(described in the main configuration guide).
{
"resources": [
{
"resource": "resource1",
"resource_id": 1,
"enabled": true,
"datasetmap": "supremm",
"hardware": {
"gpfs": "gpfs0",
"network": [
"em1",
"eno1"
],
"mounts": {
"projects": "/projects",
"home": "/user",
"util": "/util"
}
}
}
]
}
The value of the resource
parameter should be identical to the resource
parameter in the resources.json
main configuration file.
The value of the resource_id
must be the id of the resource in the XDMoD
datawarehouse. This value is obtained automatically by the interactive setup
script. It can be manually obtained by running the following SQL query:
mysql> SELECT id FROM `modw`.`resourcefact` WHERE code = "%resource%";
where %resource%
should be replaced with the resource
parameter from the
resources.json
main configuration file.
The datasetmap
option allows the ingestion of Job Performance data from different
data sources. Currently PCP and Prometheus are the only supported data sources.
The hardware
property is used by the dataset mapping code to process
metrics that have device-specific names. There are configurable mappings
for Ethernet network devices, GPFS devices and mounted NFS filesystems.
The XDMoD statistics for each mapping setting is displayed in the table below.
Configuration Property | XDMoD Statistics | XDMoD Group Bys | Job Viewer Summary tab statistics |
---|---|---|---|
hardware.gpfs | "Average gpfs receive rate", "Average gpfs transmit rate" | "GPFS bytes received" | Parallel filesystem gpfs *" |
hardware.network | "Average eth0 receive rate", "Average eth0 transmit rate" | "Net Eth0 *" | |
hardware.mounts.projects | "Avg /projects write rate" | "Mount point "projects" data *" | |
hardware.mounts.home | "Avg /home write rate" | "Mount point "home" data *" | |
hardware.mounts.util | "Avg /util write rate" | "Mount point "util" data *" |
The mapping allows multiple modes of operation. The mapping software can compute the sum of all the statistics collected from the devices. It can also be given a list of device names in priority order and will use the statistics from the first device that is found. This feature is particularly useful for heterogeneous clusters. The list below describes the appropriate value to set in the configuration file for a given scenario.
- Specify the name of the device as reported by the O/S on the compute nodes.
- Specify an empty string if the device is absent or you do not wish the data to appear in XDMoD.
- Specify the string
all
if you would like the metric in XDMoD to be the sum of all of the detected devices on the compute nodes. - Specify a list of device names. The mapping software will use the first device name in the list that is present in the summary statistics for each job.
Examples
An example hardware configuration setting is shown below:
"hardware": {
"gpfs": "gpfs0.example.edu",
"network": [
"em1",
"eno1"
],
"mounts": {
"projects": "/projects",
"home": "/user",
"util": "/util"
}
}
In this example the mapping would work as follows:
- The various XDMoD GPFS parallel filesystem statistics would be based on the GPFS filesystem
gpfs0.example.edu
mounted on the compute nodes. - The various XDMoD Ethernet statistics would be based on the data read from and written to the
em1
device on compute nodes that had anem1
Ethernet device andeno1
for compute nodes that did not have anem1
device but did have aneno1
device. - The XDMoD
projects
filesystem statistics would be from the statistics collected from the NFS filesystem mounted at/projects
on the compute nodes. - The XDMoD
home
filesystem statistics would be from the statistics collected from the NFS filesystem mounted at/user
on the compute nodes. - The XDMoD
util
filesystem statistics would be from the statistics collected from the NFS filesystem mounted at/util
on the compute nodes.
"hardware": {
"gpfs": "all",
"network": "eth1"
}
In this example the mapping would work as follows:
- The various XDMoD GPFS parallel filesystem statistics would be based on sum of the statistics for all of the mounted GPFS filesystems.
- The various XDMoD Ethernet statistics would be based on the data read from and written to the
eth1
device on compute nodes. - No data would be stored in XDMoD for NFS filesystems.
portal_settings.d/supremm.ini
Contains the configuration settings to allow XDMoD to connect to the job summary document database. The only supported db_engine is MongoDB.
[jobsummarydb]
db_engine = "MongoDB"
uri = "mongodb://localhost:27017/supremm"
db = "supremm"
The uri syntax is described in the mongo documentation. You must specify the database name in the connection URI. If the database is not specified then the MongoDB driver defaults to the ‘test’ database, which will not contain the job summary data.
Advanced Configuration Options
The resource configuration file supremm_resources.json
has optional advanced
configuration settings for each resource.
The $.resources[*].collection
option overrides the collection name in the
MongoDB. This option can be used to set a non default collection name.
The $.resources[*].db
option specifies the name of the section in the
portal_settings file that contains the database configuration settings. This
setting can be used to support an XDMoD instance ingesting data from multiple
MongoDB databases.