Configure → Job Summarization

This guide explains how to configure the Job Summarization software.

Prerequisites

The Job Performance (SUPReMM) XDMoD module must be installed and configured before configuring the Job Summarization software.

Setup Script

The Job Summarization software includes a setup script to help you configure your installation. The script will prompt for information needed to configure the software and update the configuration files and databases. If you have modified your configuration files manually, be sure to make backups before running this command:

# supremm-setup

The setup script needs to be run as a user that has write access to the configuration files. You may either specify a writable path name when prompted (and then manually copy the generated configuration files) or run the script as the root user.

The setup script has an interactive ncurses-based menu-driven interface. A description of the main menu options is below:

Create configuration file

This section prompts for the configuration settings for the XDMoD datawarehouse and the MongoDB database. The script will automatically detect the resources from the XDMoD datawarehouse and prompt for the settings for each of them.

Create database tables

This section will create the database tables that are needed for the job summarization software.

The default connection settings are read from the configuration file (but can be overridden). It is necessary to supply username and password of a database user account that has CREATE privileges on the XDMoD modw_supremm database.

Initialize MongoDB Database

This section will add required data to the MongoDB database.

The default connection settings are read from the configuration file (but can be overridden).

Configuration Guide

The SUPReMM job summarization software is configured using a json-style format file that uses json syntax but permits line-based comments (lines starting with // are ignored by the parser).

This file is stored in /etc/supremm/config.json for RPM based installs or under [PREFIX]/etc/supremm/config.json for source code installs, where [PREFIX] is the path that was passed to the install script.

The paths shown in this configuration guide show the default values for RPM-based installs. For source code installs you will need to adjust the paths in the examples to match the installed location of the package.

The top level properties are listed in the table below:

Setting Description
summaryContains configuration settings for the summarize_jobs.py script.
resourcesContains details about the compute resources.
datawarehouseContains configuration to access XDMoD's database.
outputdatabaseContains configuration settings for the database used to store the job summary data.
xdmodrootThis optional setting defines the path to the XDMoD configuration directory. This is only used if the summarization software runs on the same machine as the XDMoD software is installed. If present then the software will read the XDMoD database configuration directly from the XDMoD portal settings file. This obviates the need to redundantly specify database settings.

Summary settings

The summary element contains configuration for the summarize_jobs.py script.

{
    ...
    "summary": {
        "archive_out_dir": "/dev/shm/supremm_test",
        "subdir_out_format": "%r/%j"
    }
}
Setting Example value Description
archive_out_dir/dev/shm/supremmPath to a directory that is used to store temporary files. The summary script will try to create the directory if is does not exist. The default value is to use a path under /dev/shm because this is the typical location of a tmpfs filesystem. The summarization software performance is typically improved by using tmpfs for temporary files but this is not required.
subdir_out_format%r/%jSpecifies the path under the archive_out_dir to be used for temporary files during the summarization of each job. Different subdirectories should used for each job because jobs are processed in parallel. The format string includes the following substitutions: %r is replaced by the resource name and %j the job identifier. Additionally any valid format specifiers to the strftime function are permitted. The strftime function is called with the end time of the job.

Resource settings

The “my_cluster_name” string and value of the resource_id field should be set to the same values as the code and id columns in the Open XDMoD modw.resourcefact table in the datawarehouse.

{
    ...
    "resources": {
        "my_cluster_name": {
            "enabled": true,
            "resource_id": 1,
            "batch_system": "XDMoD",
            "hostname_mode": "hostname",
            "pcp_log_dir": "/data/pcp-logs/my_cluster_name",
            "batchscript": {
                "path": "/data/jobscripts/my_cluster_name",
                "timestamp_mode": "start"
            }
        }
    }
}

The various settings are described in the table below:

Setting Allowed values Description
enabledtrue | falseIf set to false then this resource will be ignored by the software
resource_id[integer]The value from the id column in the modw.resourcefact table in the XDMoD database
batch_systemXDMoDSets the module used to obtain job accounting information. This should be set to XDMoD
hostname_modehostname | fqdnDetermines how compute node names as reported by the resource manager are compared with the node name information from the PCP archives. If the resource manager reports just the hostname for compute nodes in the accounting logs then this value should be set to hostname. If the resource manager reports full domain names in the accounting logs then this value should be set to fqdn (see also the host_name_ext setting below). Typically, the Slurm resource manager reports just the hostname in the accounting logs.
host_name_ext[domain name]If the hostname_mode is fqdn and the host_name_ext is specified then the string will be appended to the node name from the PCP archives if it is absent. This is used to workaround misconfigured /etc/hosts files on the compute nodes that result in only the hostname information begin recorded in the PCP achive metadata. This setting is ignored if the hostname_mode is set to hostname and may be omitted in this case.
datasource[pcp or prometheus]Data collection software used to monitor the resource.
pcp_log_dir[filesystem path]Path to the PCP log files for the resource.
prom_host[hostname]Hostname for the Prometheus server monitoring the resource.
prom_user[username]Username for basic authentication to the Prometheus server.
prom_password[password]Password for basic authentication to the Prometheus server.
batchscript.path[filesystem path]Path to the batch script files. The batch scripts must be stored following the naming convention described in the job script documentation. Set this to an empty string if the batch script files are not saved.
batchscript.timestamp_modestart | submit | end | noneHow to interpret the directory timestamp names for the batch scripts. start means that the directory name corresponds to the job start time, submit the job submit time, end the job end time and none the timestamp should not be included in the job lookup.


Database authentication settings

The configuration file supports two different mechanisms to specify the access credentials for the Open XDMoD datawarehouse. Choose one of these options. Either:

  1. Specify the path to the Open XDMoD install location (and the code will use the Open XDMoD configuration directly) or
  2. Specify the location and access credentials directly.

If the summarization software is installed on the same machine as Open XDMoD then (1) is the recommended option. Otherwise use option (2).

Option (1) XDMoD path specification

If the summarization software is installed on the same machine as Open XDMoD then ensure the config.json has the following settings:

{
    ...
    "xdmodroot": "/etc/xdmod",
    "datawarehouse": {
        "include": "xdmod://datawarehouse"
    },
}

Where xdmodroot should be set to the location of the xdmod configuration directory, typically /etc/xdmod for RPM based installs. Note that the user account that runs the summarization scripts will need to have read permission on the xdmod configuration files. For an RPM based install, the xdmod user account has the correct permission.

Option (2) Direct DB credentials

If the summarization software is installed on a dedicated machine (separate from the Open XDMoD server), then the XDMoD datawarehouse location and access credentials should be specified as follows:

Create a file called .supremm.my.cnf in the home directory of the user that will run the job summarization software. This file must include the username and password to the Open XDMoD datawarehouse mysql server:

[client]
user=[USERNAME]
password=[PASSWORD]

ensure the “datawarehouse” section of the config.json file has settings like the following, where XDMOD_DATABASE_FILL_ME_IN should be set to the hostname of the XDMoD database server.

{
    ...
    "datawarehouse": {
        "db_engine": "MySQLDB",
        "host": "XDMOD_DATABASE_FILL_ME_IN",
        "defaultsfile": "~/.supremm.my.cnf"
    },
}

MongoDB settings

If you used Option (1) XDMoD path specification in the datawarehouse configuration then use the following configuration settings:

{
    ...
    "outputdatabase": {
        "include": "xdmod://jobsummarydb"
    }
}

Otherwise the MongoDB settings can be specified directly as follows: The outputdatabase.uri should be set to the uri of the MongoDB server that will be used to store the job level summary documents. The uri syntax is described in the MongoDB documentation. You must specify the database name in the connection uri string in addition to specifying it in the dbname field

{
    ...
    "outputdatabase": {
        "type": "mongodb",
        "uri": "mongodb://localhost:27017/supremm",
        "dbname": "supremm"
    },
}

Setup the Database

The summarization software uses relational database tables to keep track of which jobs have been summarized, when and which version of the software was used. These tables are added to the modw_supremm schema that was created when the Open XDMoD SUPReMM module was installed. The database creation script is located in the /usr/share/supremm/setup directory and should be run on the XDMoD datawarehouse DB instance.

$ mysql -u root -p < [PATH TO PYTHON SITE PACKAGES]/supremm/assets/modw_supremm.sql

Where [PATH TO PYTHON SITE PACKAGES] is the path to the python site packages install directory (/usr/lib64/python2.7 for a Centos 7 RPM install and /usr/lib64/python3.6/site-packages for Rocky 8 RPM install).

Setup MongoDB

$ mongo [MONGO CONNECTION URI] [PATH TO PYTHON SITE PACKAGES]/supremm/assets/mongo_setup.js

where [MONGO CONNECTION URI] is the uri of the MongoDB database.