Watchdog

Description

This module installs the Software Watchdog and a check and repair script. The scripts use files in /escaux/etc/watched to monitor processes or anything else that is defined in these scripts.

Release notes

Version 1.5.0 - Early deployment
  • Feature: Convert plugins to new library model (M14774)
  • Potential update impact level 1 DONE: no critical impact expected. Update can be applied without risk of breaking critical functionality.: You will have to upgrade the Shell module and other modules too. Please verify the release notes of the Shell module 1.23.0 to find out which modules need to be upgraded. (M14774)
  • Dependency:
    • System Base module >= 1.0.0
    • Shell Module >= 1.23.0
    • Baseline >= 2

Version 1.4.1 - General deployment
  • Bugfix: default values could be incorrect when upgrading from a previous version and new options are available.(Backport) (M10261)
  • Dependency:
    • System Base module >= 1.0.0

Version 1.4.0 - General deployment
  • Feature: Stop Communication Server when the configured number of MB is reached (M9014)
  • Dependency:
    • System Base module >= 1.0.0

Version 1.3.0 - General deployment
  • Improvement: Reduce module installation time on high latency networks (M8766)
  • Dependency:
    • System Base module >= 1.0.0

Version 1.2.2 - General deployment
  • Bugfix: Packages were not cached between SOP reboots (M7066)
  • Dependency:
    • System Base module >= 1.0.0

Version 1.2.1 - General deployment
  • Bugfix: There were multiple concurrent repair processes when we detected one failure (M3253)
  • Bugfix: watchdog processes could cause any out-of-memory on baseline 3
  • Bugfix: Always create /var/run/watched directory when starting watchdog checks because this directory is deleted at reboot (M6905)
  • Improvement: mserver sanity check is now launched by cron instead of being launched by watchdog(M6057)
  • Dependency:
    • System Base module >= 1.0.0

Version 1.2.0 - Deprecated
  • Feature: Compatibility with Baseline 3
  • Deprecated: Can cause an out-of-memory error on Baseline 3
  • Dependency:
    • System Base module >= 1.0.0

Version 1.1.0 - General deployment
  • Feature: Potential update impact level 2 DONE: in the event this update contains a bug, it might have critical impact. Respect dependencies and retest your most important callflows and applicative integrations. Support for baseline 2
  • Feature: Added safetynet

Version 1.0.4 - General deployment
  • Feature: Logging to syslog

Version 1.0.3 - General deployment
  • Enabling a service doesn't start it. Fixes trouble updating mserver and dependencies

Version 1.0.2 - General deployment
  • Only one watchdog_check can run at the same time

Version 1.0.1 - General deployment
  • Forced timeout of check scripts. They get 20 seconds and then get killed and marked as failed.
  • Kernel module (softdog) is now used.

Version 1.0.0 - General deployment
  • Initial release

Module configuration interface

create_resource_form: .:/usr/share/escaux/glue/lib:/usr/share/escaux/glue/bin/gen_wiki_documentation/src/lib:/usr/share/escaux/glue/bin/gen_wiki_documentation/src/lib/

Stop processes on low disk space
Minimal free disk space in MB

Module parameters

  • Stop processes on low disk space: Stop processes when the limit given in Minimal free disk space in MB is reached. Currently only Asterisk will be stopped.
  • Minimal free disk space in MB: Minimal amount of disk space in megabytes to be free before processes are shutdown.

Shell plugin to restore processes stopped by Stop processes on low disk space feature

When processes have been shutdown by the Stop processes on low disk space feature, they will not be restarted automatically by safetynet anymore. To re-enable that feature, start the shell and go to "Subsystems/Disk full/Enable safetynet services disabled by disk full". This will list the processes that have been re-enabled in safetynet. Then wait at most one minute before the processes are restarted.

Note that the disk must have been cleaned to make some free space. Otherwise, the processes are going to be stopped again by watchdog.

Post-install actions

In case this module is installed on a Baseline 1 High Availability SOP that is currently in standby mode, you need to use the shell plugin to deactivate the processes. This is available in the High Availability module version > 2.6.0.
Copyright © Escaux SA