Disk Space (DPS1)
Description
This probe measures the available and used space on a specific file system (disk partition).
Release notes
Version 1.2 - General deployment
- Bugfix: Incorrect value for percentage used.(M5239)
- Dependency:
- SNMP Agent module v2.0+
- escaux-nagios-plugins 1.3+
Version 1.1 - General deployment
- Feature: Offer the possibility to check on the MB threshold instead of percentage
- Dependency:
Version 1.0 - General deployment
- Feature: Initial version
- Dependency:
Resource configuration interface
Resource parameters
- partition: the name of the file system to be monitored. Possible values are:
-
/
: the root (system) file system: a relatively small partition that contains the most critical parts of the system.
-
/data
: this larger partition contains all user data like voicemails etc.
- warning level: generate a WARNING alarm if more than this percentage is used. The default value is 85 (%). The value should be chosen to give enough headroom and time to take action to avoid a CRITICAL alarm.
- critical level: generate a CRITICAL alarm if more than this percentage is used. The default value is 95 (%). The value should be chosen such that it is (almost) certain that service is currently impacted and immediate action is required.
- Check type: Indicate nothing if level value are expressed in percentage, indicate 'bu' if value are expressed in megabyte
Example
To monitor the root partition with a warning at 80% and a critical alarm at 99%:
- partition: /
- warning value: 80
- critical value: 99
This probe generates a graph with following performance metrics:
- Size: the size of the partition. This is a fixed value.
- Used: the space that is currently in use.
- Free: the space that is currently free.
Alarms
This probe can report following alarm states:
- WARNING: the used space on the given partition has surpassed the warning level threshold.
- CRITICAL: the used space on the given partition has surpassed the critical level threshold.
Possible causes
No likely root-causes are documented yet.
Possible consequences
- As long as there is enough space for the processes that require it, no service is impacted and not even degraded.
- But if there is not enough space, any process that requires it may fail in various ways and has very severe consequences. This includes:
- telephony
- netDesktop, netConsole
- (almost) anything
Possible actions
- Look at the graph:
- If the usage is slowly increasing over time a long period of time, you can predict when the used space will reach 100% and as such determine the urgency.
- If the slope is rather steep, this is not normal. Something may be generating excessive logs for instance. Again you should determine the urgency by predicting when it will reach 100%.
- If the increase was instantaneous and the level is again constant, something must have happened at that time but the situation is stable now.
- Contact Escaux