CPU Load Average (CPL1)

Description

This probe measures the average processor load over the last 1, 5 and 15 minutes. A value of 1 means that one CPU is fully loaded. Typically a value up to 0.7 is considered healthy. If the system has more than one core, the value may be multiplied by the number of cores. E.g. on a quad core system, a load average of 3 is perfectly healthy. If the load average is higher, it means that things are being slowed down due to a lack of CPU power. Since it is hard to tell at which load average, services start to actually get affected it is adviced not to set your CRITICAL threshold too low.

Release notes

Version 1.0 - General deployment
  • Feature: Initial version
  • Dependency:
    • SNMP Agent v2.0+

Resource configuration interface

GUI unavailable.

Resource parameters

  • load warning level: the minimal load average to generate a WARNING alarm. Recommended value is 0.75 times the number of CPU cores. Some experience is typically required to find a good value that works for you. The default value is 10 which is definitely not a normal load.
  • load critical level: the minimal load average to generate a CRITICAL alarm. It's recommended to not set this at all or to set it high enough to avoid false alarms. You're not sure that e.g. a load of even 10 is actually service affecting, it depends on various factors. Some experience is typically required to find a good value that works for you. The default value is 50 to make sure there are no false alarms.

Performance graphs

This probe generates a graph with following performance metrics:

  • load (1 minute average): a value of 1 means that 1 CPU is fully loaded on average.

Alarms

This probe can report following alarm states:

  • WARNING: any of the 1, 5 or 15 minute load averages surpassed the load warning level parameter
  • CRITICAL: any of the 1, 5 or 15 minute load averages surpassed the load critical level parameter

Possible causes

A high CPU load may have various causes.

Possible consequences

A high CPU load may cause the system to react slower than normal. On higher loads, voice quality may be impacted and various services may start to fail.

Possible actions

Following actions may be taken in response to a high load:
  • Investigate which process is responsible for the high load.
    • SOP Shell > Diagnostics > System > CPU/Memory usage

Copyright © Escaux SA