PCP Configuration - Compute Nodes

Obtain the suggested version of PCP (3.11.4)

We have tested the install and configuration on Centos 6 and Centos 7.

  • https://bintray.com/pcp/el6/pcp/3.11.4/
  • https://bintray.com/pcp/el7/pcp/3.11.4/

The “Set Me Up” link on the bintray website provides instructions for adding a yum repository, otherwise the files can be individually downloaded from the “Files” link.

Install the desired modules on all compute nodes you wish to monitor

For a full install with all monitors being used at CCR:

# yum install pcp-3.*.x86_64.rpm pcp-conf-3.*.x86_64.rpm pcp-libs-3.*.x86_64.rpm \
           pcp-libs-devel-3.*.x86_64.rpm pcp-pmda-gpfs-3.*.x86_64.rpm \
           pcp-pmda-logger-3.*.x86_64.rpm pcp-pmda-lustre-3.*.x86_64.rpm \
           pcp-pmda-mic-3.*.x86_64.rpm pcp-pmda-nfsclient-3.*.x86_64.rpm \
           pcp-pmda-nvidia-gpu-3.*.x86_64.rpm pcp-pmda-perfevent-3.*.x86_64.rpm \
           pcp-pmda-slurm-3.*.x86_64.rpm pcp-system-tools-3.*.x86_64.rpm \
           perl-PCP-LogImport-3.*.x86_64.rpm perl-PCP-PMDA-3.*.x86_64.rpm \
           python-pcp-3.*.x86_64.rpm

PMDAs are the modules that monitor the actual subsystems you are interested in.

  • The packages that are strictly required:
    • pcp
    • pcp-libs
    • pcp-libs-devel
    • pcp-conf
    • perl-PCP-PMDA
    • python-pcp
# yum install pcp-3.*.x86_64.rpm pcp-libs-* pcp-conf-3.*.x86_64.rpm \
  perl-PCP-PMDA-3.*.x86_64.rpm  python-pcp-3.*.x86_64.rpm

Dependencies

  • pcp-pmda-nvidia-gpu
    • Depends on Nvidia NVML
  • pcp-pmda-perfevent
    • Depends on libpfm
  • pcp-pmda-slurm
    • Depends on the SLURM perl bindings
  • pcp-pmda-gpfs
    • Depends on the GPFS mmfs tools being installed
  • pcp-pmda-infiniband
    • Depends on the libmad and libumad packages

Configuration Templates

The SUPReMM summarization package includes template files that can be used to configure PCP collection on the compute nodes. The package itself does not need to be installed on the compute nodes, however you may wish to install it on a test node in order to obtain the PCP configuration file templates.

Package installation instructions are documented on the SUPReMM summarization package page.

Alternatively, the templates may be extracted directly from the source tarball.

These templates are available:

/usr/share/supremm/templates/pmlogger/control

  • Moved to: /etc/pcp/pmlogger
    • Remove any existing files under: /etc/pcp/pmlogger/control.d
  • THIS CHANGE MUST BE MADE
    • Edit the file to specify that the logs be written to shared space, accessable by the Supremm processing machine
    • “PCP_LOG_DIR/pmlogger/LOCALHOSTNAME”
      • Changed to something like: “//supremm/pmlogger/LOCALHOSTNAME"
      • Where “LOCALHOSTNAME” is that exact literal string

/usr/share/supremm/templates/pmlogger/pmlogger-supremm.config

  • Moved to /etc/pcp/pmlogger
  • Can be updated to change metrics logged or frequency
    • You may wish to reduce logging frequency from the default 30 seconds until confirming impact on your system and storage utilization

The following configurations can be enabled if you wish to enable monitoring/restart of monitoring processes

  • This configuration can be used to ensure that PCP logging is restarted if any of the monitoring processes die for any reason. This can occur specifically when a system is under high I/O load. In this case, the PCP monitoring can stall while the OS deals with the I/O requests. The PMIE process will monitor and log this condition and restart the daemons as appropriate. This is not strictly necessary, but if this configuration is not done, intervention will be required to restart PCP manually if the monitoring processes die.
    • This requires allowing the “pcp” user to run: “sudo /etc/pcp/pmie/pcp-restart.sh” without a password

/usr/share/supremm/templates/pmda-logger/logger.conf

  • Moved to /var/lib/pcp/config/logger
  • No changes should be necessary

/usr/share/supremm/templates/pmie/control

  • Moved to /etc/pcp/pmie
  • No changes should be necessary

/usr/share/supremm/templates/pmie/pmie-supremm.config

  • Moved to /etc/pcp/pmie
  • No changes should be necessary

/usr/share/supremm/templates/pmie/pcp-restart.sh

  • Moved to /etc/pcp/pmie
  • Set execute bit

/usr/share/supremm/templates/pmie/procpmda_check.sh

  • Moved to /etc/pcp/pmie
  • Set execute bit

Enable logging modules (PMDAs)

  • By default, in order to be lightweight, PCP does not enable all logging modules (PMDAs)
  • They may be enabled by creating a “.NeedInstall” file which instructs the PCP framework to enable the PMDA on next restart.
$ touch /var/lib/pcp/pmdas/slurm/.NeedInstall
$ touch /var/lib/pcp/pmdas/nvidia/.NeedInstall
$ touch /var/lib/pcp/pmdas/gpfs/.NeedInstall
$ touch /var/lib/pcp/pmdas/nfsclient/.NeedInstall
$ touch /var/lib/pcp/pmdas/perfevent/.NeedInstall
$ touch /var/lib/pcp/pmdas/mic/.NeedInstall

Start PCP

# systemctl enable pmcd pmlogger pmie
# systemctl start pmcd pmlogger pmie

Check for Errors

It may take several seconds for all the PMDAs to start up

$ cat /var/log/pcp/pmcd/*.log
$ cat /<GLOBAL_SHARED_SPACE>/supremm/pmlogger/`hostname`/pmlogger.log

Check for Running processes

  • Ensure the pmcd, pmlogger,pmie and any pmda processes you enabled are running
$ ps -ef | grep -i pcp
    pcp      36927     1  0 Sep22 ?        00:00:00 pmie -b -h local: -l /var/log/pcp/pmie/srv-u13-27.cbls.ccr.buffalo.edu/pmie.log -c /etc/pcp/pmie/pmie-config.ubccr
    pcp      37221     1  0 Sep22 ?        00:00:03 /usr/libexec/pcp/bin/pmcd
    root     37223 37221  0 Sep22 ?        00:00:00 /var/lib/pcp/pmdas/root/pmdaroot
    root     37224 37221  0 Sep22 ?        00:11:43 /var/lib/pcp/pmdas/proc/pmdaproc -A
    root     37225 37221  0 Sep22 ?        00:00:00 /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
    root     37226 37221  0 Sep22 ?        00:00:13 /var/lib/pcp/pmdas/linux/pmdalinux
    pcp      37227 37221  0 Sep22 ?        00:00:03 /var/lib/pcp/pmdas/logger/pmdalogger /var/lib/pcp/config/logger/logger.conf
    pcp      37228 37221  0 Sep22 ?        00:00:30 /var/lib/pcp/pmdas/perfevent/pmdaperfevent -d 127
    pcp      42945     1  0 00:10 ?        00:00:00 pmlogger -r -m pmlogger_daily -P -l pmlogger.log -c /etc/pcp/pmlogger/pmlogger-config.ubccr 20150924.00.10

Check that archives are being created

It may take several seconds to minutes for the log to accumulate data depending on your logging frequency

$ cd /<GLOBAL_SHARED_SPACE>/supremm/pmlogger/`hostname`
$ ls
    20150924.14.16-00.0
    20150924.14.16-00.index
    20150924.14.16-00.meta

$ pmdumplog -a 20150924.14.16-00

Ensure that the archives have the metrics you expect.