Advanced Usage → Job Efficiency Algorithm

The Job Efficiency Dashboard shows job information broken down by the categorized efficiency. The categorization algorithm is implemented in SQL and runs during the data aggregation step. The algorithm may be customized by editing the definition file

/etc/xdmod/etl/etl_macros.d/jobefficiency/job_categorization.sql

If the definition file is modified then any new jobs will be categorized using the updated algorithm. It does not automatically re-categorize existing jobs.

The contents of the file must be valid SQL fragment. The SQL itself is used in a SELECT statement. An example of the default algorithm is shown below:

-- ----------------------------------------------------------------------------
-- Classify a job based on the performance statistics
-- ----------------------------------------------------------------------------
CASE
    WHEN cpu_user IS NULL THEN
        -1
    WHEN cpu_user < 0.1 AND COALESCE(max_memory, 1.0) < 0.5 THEN
        2
    ELSE
        1
END

The output must be one of three values:

  • -1 if the algorithm is unable to categorize the job
  • 1 to mark the job as efficient
  • 2 to mark the job as inefficient

The SQL query can use any value from the job performance fact table modw_suprem.job The documentation for each column in the table is provided in the COMMENT field in the table definition. The comment field can be viewed using the following statement in the mysql command line client:

SHOW FULL COLUMNS FROM `modw_supremm`.`job`;

Not all performance metrics will be present for all jobs. If a column in the database is nullable then a null value is used to indicate that the corresponding metric was not present.

Examples

The example below shows how to categorize jobs solely based on the CPU User metric. The example will mark jobs with CPU User less than 20% as inefficient.

-- ----------------------------------------------------------------------------
-- Categorize jobs based only on the value of CPU User with a threshold
-- of 20% (i.e. a ratio of 0.2)
-- ----------------------------------------------------------------------------
CASE
    WHEN cpu_user IS NULL THEN
        -1
    WHEN cpu_user < 0.2 THEN
        2
    ELSE
        1
END

The example below shows how to use different criteria based on the other accounting information about the job. In this case, if the job ran on a partition with a name that starts with ‘gpu’ then the job’s efficiency is determined based on the GPU usage. Otherwise the CPU usage is used. In both cases a 10% (ratio of 0.1) threshold is used. The job’s partition (also known as the queue) is stored as text in the queue_id column.

CASE
    WHEN
        queue_id LIKE 'gpu%'
    THEN
        CASE
            WHEN gpu0_nv_utilization IS NULL THEN - 1
            WHEN gpu0_nv_utilization < 0.1 THEN 1
            ELSE 2
        END
    ELSE
        CASE
            WHEN cpu_user IS NULL THEN - 1
            WHEN cpu_user < 0.1 THEN 1
            ELSE 2
        END
END

Reprocessing existing jobs.

If the sql definition is updated then all new jobs will be categoried with the new definition. Existing data that has already been ingested into Open XDMoD will not automatically be reprocessed. All jobs can be reprocessed by running the following command:

/usr/share/xdmod/tools/etl/etl_overseer.php --last-modified-start-date 2000-01-01 -p jobefficiency.aggregation -p jobefficiency.joblist

This will reaggregate all job efficiency data.