Disk Space (DPS1)

Description

This probe measures the available and used space on a specific file system (disk partition).

Release notes

Version 1.2 - General deployment
  • Bugfix: Incorrect value for percentage used.(M5239)
  • Dependency:
    • SNMP Agent module v2.0+
    • escaux-nagios-plugins 1.3+

Version 1.1 - General deployment
  • Feature: Offer the possibility to check on the MB threshold instead of percentage
  • Dependency:
    • SNMP Agent module v2.0+

Version 1.0 - General deployment
  • Feature: Initial version
  • Dependency:
    • SNMP Agent module v2.0+

Resource configuration interface

GUI unavailable.

Resource parameters

  • partition: the name of the file system to be monitored. Possible values are:
    • /: the root (system) file system: a relatively small partition that contains the most critical parts of the system.
    • /data: this larger partition contains all user data like voicemails etc.
  • warning level: generate a WARNING alarm if more than this percentage is used. The default value is 85 (%). The value should be chosen to give enough headroom and time to take action to avoid a CRITICAL alarm.
  • critical level: generate a CRITICAL alarm if more than this percentage is used. The default value is 95 (%). The value should be chosen such that it is (almost) certain that service is currently impacted and immediate action is required.
  • Check type: Indicate nothing if level value are expressed in percentage, indicate 'bu' if value are expressed in megabyte

Example

To monitor the root partition with a warning at 80% and a critical alarm at 99%:
  • partition: /
  • warning value: 80
  • critical value: 99

Performance graphs

This probe generates a graph with following performance metrics:

  • Size: the size of the partition. This is a fixed value.
  • Used: the space that is currently in use.
  • Free: the space that is currently free.

Alarms

This probe can report following alarm states:

  • WARNING: the used space on the given partition has surpassed the warning level threshold.
  • CRITICAL: the used space on the given partition has surpassed the critical level threshold.

Possible causes

No likely root-causes are documented yet.

Possible consequences

  • As long as there is enough space for the processes that require it, no service is impacted and not even degraded.
  • But if there is not enough space, any process that requires it may fail in various ways and has very severe consequences. This includes:
    • telephony
    • netDesktop, netConsole
    • (almost) anything

Possible actions

  • Look at the graph:
    • If the usage is slowly increasing over time a long period of time, you can predict when the used space will reach 100% and as such determine the urgency.
    • If the slope is rather steep, this is not normal. Something may be generating excessive logs for instance. Again you should determine the urgency by predicting when it will reach 100%.
    • If the increase was instantaneous and the level is again constant, something must have happened at that time but the situation is stable now.
  • Contact Escaux

Copyright © Escaux SA