Service Assurance: net.Console 3.2

The goal of this document is to provide a set of tools and guidelines to localize the cause of an incident and the procedure to repair it.

If a section is marked with ESCAUX Support only, this indicates that only Escaux engineers have sufficient access rights to perform the tasks mentionned in this sections. In that case please contact ESCAUX.

Do not restart the application management server if not needed, this will cause all clients to be disconnected and all states will be lost (it becomes as important as asterisk itself)

System overview

This section provides a quick overview of the net.Console architecture. It is important to understand these concepts when dealing with net.Console related problems. This section is only a summary of the "net.Console Product Architecture".

We distinguish the following subsections:
  • the runtime view
  • the directory and CDR synchronisation
  • the active/active redundancy view

Runtime interactions

The following picture provides an overview of all the interactions involved in the runtime solution.

DirectedGraphPlugin_1.png diagram

Directory and CDR synchronization view

The net.Console provides the possibility to insert contacts coming from an external database into the net.Console directory. These contacts are then merged together with the contacts pulled from the SOP server. The 3rd party contact database interacts with the "Unified Communication Model" module.

For reporting purposes, it is possible to pull the raw CDRs (Call Detail Records) from the Call Management module.

DirectedGraphPlugin_1.png diagram

Active-active configuration

The following picture shows the active-active configuration of the net.Console. In the active-active architecture we always have an primary server and a secondary server. The primary server handles all the incoming and outgoing call. When the primary server fails, all the net.Console clients and SNOM phones start working with the secondary server and the secondary server handles all incoming and outgoing calls. This is called 'failover mode'.

When the primary SOP detects a problem with one of his critical processes he will execute a SMITH procedure (Shoot Myself In The Head). As a result of this SMITH procedure, all incoming calls on the primary SOP are rejected and a congestion signal is sent to the upstream gateway. The upstream gateway then sends the call to the secondary SOP.

When the secondary SOP receives the call, he will consider the primary SOP to be in trouble. To avoid flapping problems, the secondary SOP will actively try to kill the primary server by sending a STONITH command to the primary SOP (Shoot The Other Node In The Head).

The failover from primary to secondary server is an automatic process. The recovery from secondary to primary is a manual process. This avoids flapping issues. Once the problem on the primary server has been corrected, the recovery procedure can be initiated manually. This is explained in a later section.

DirectedGraphPlugin_1.png diagram

Troubleshooting information sources

Various informations sources are at the disposal to allow the identification of net.Console problems.

The telephony process running on the SOP is based upon the asterisk process. The asterisk console is accessible via the Sop shell menu (Shell > Diagnostic > Telephony >asterisk console). The following commands are useful:

  • set verbose 9: this enables full verbosity. The verbose messages are displayed in the console and in /var/log/asterisk/messages if the full debugging has been enabled in the module
  • set verbose 0: disable the verbose.
  • set debug 9: enable the debugging entry. The debugging entry are only written in /var/log/messages. This requires the full logging to be enable in asterisk module.
  • show queues: list all the queues and the status of the queue members on the SOP
  • show channels: list the channels and provide the current number of calls on the SOP
  • sip show peers like : list the status of all the SIP peers with a name like the filter. For example a filter "sip show peers like SDO" provides the status of all the SNOM phones. "sip show peers like SOA" provides the status of all the SIP trunks.

Asterisk logging is accessible via the Sop shell menu
DONE Navigate to: Shell > Diagnostic > Telephony > view asterisk log

Asterisk CDRs is accessible via the Sop shell menu

DONE Navigate to: Shell > Diagnostic > Telephony > asterisk call data record

The following log file are available for trouble shooting:

  • On the SOP, asterisk generates the following files:
    • /var/log/asterisk/events: contains anomalies detected by asterisk (NOTICE, WARNING, ERROR)
    • /var/log/asterisk/messages: contains complete logging can be found after enabling it (NOTICE, WARNING, ERROR, DEBUG, VERBOSE,DTMF)
    • /var/log/astersik/queue_log: contains the specific output of the queue application: changes in member status and calls handled by the queue
    • /var:log/asterisk/cdr-csv/Master.csv: contains the call data records of all the calls handled by Asterisk
  • asterisk also provides verbose information in its console

Application Management Server console

The application management server (java backend process) is at the heart of the net.Console service. This process can be controlled via the SOP shell application.

DONE Navigate to: Shell > Subsystems > Management Server

The following operations are possible:
  • Start: start the java process
  • Stop: stop the java process
  • Show status netConsole controller service: show whether the net.Console server is active or not. This requires the net.Console availability probe.
  • Activate netConsole controller service: This specific to the active-active configuration. By activating the net.Console controller service, the net.Console is forced to start accepting incoming calls. This requires the net.Console availability probe and its check at the beginning of the reception callflows.
  • Deactivate netConsole controller service: This is specific to the active-active configuration. Force the net.Console to block incoming calls to the reception. This requires the net.Console availability probe and its check at the beginning of the reception callflows. On the primary SOP this will trigger the fail-over. On the secondary SOP (Active-Active configuration), the call are not blocked but the secondary SOP will stop forcing the deactivation of the primary SOP.

Escaux Support only - The following mshell commands are useful:

  • mshell ps: show the status of all the Java OSGI Bundles installed on the SOP. They must all be active.
  • mshell cc.calls: list the calls monitored by the Call Management module
  • mshell qc.list: list the queue monitored by the Call Management module
  • mshell qc.show : list the member and status of the queue
  • mshell ncctl.list: list the session in the net.Console module opened by the net.Console applications
  • mshell ps.list: list the phone status computed by the Call Management module

Escaux Support only - All logs from the Application Management server are located on the SOP. They are a main source of information to debug net.Console related issues.

  • /var/log/mserver/local.log: On the SOP, all the java based module (Call Management, net.Console, Unified Communication Model, Application Management server) logs in a single file

net.Console client log file

  • <User's home folder>\.escaux\netconsole3\netconsole.log: On the Windows PC of the reception desk, the logging of the local net.Console application

Call Management CDRs

The call management produces call data records available in:
  • /data/mserver

The following document gives the HTTP urls enabling the download of the CDR files:

Failover and recovery status ( Escaux Support only )

This section needs to be clarified !!

There are two important files which are used in determining the behavior of both the primary and secondary SOP, they are present on both:

  • SMITH flag: /opt/mserver/SMITH
  • STONITH flag: /opt/mserver/STONITH

In a normal situation both files contain 0 on both the primary and secondary SOP. if one or more files contain a 1 consult following table to assess the status.

When a SOP is healthy, it means that the SOP is accepting incoming calls (The net.Console availability probe remains OK so that calls are still being accepted in the queue of this SOP). Not healthy means it is refusing calls. This has no implication on the failover behavior, which is determined separately.

The scenario being invalid indicates that this configuration should not occur. The cases where an invalid scenario is seen is probably due to misconfiguration or human intervention. You are advised to check the configuration.

Valid scenario Primary SOP Secondary SOP Primary SOP Secondary SOP Note
SMITH STONITH SMITH STONITH Behavior Behavior
Yes 0 0 0 0 Healthy. Healthy. This is what you should see in normal operation.
No 0 0 0 1 Healthy. Healthy. Failover mechanism is disabled, clients won't be notified about activity to do the failover. Primary SOP doesn't STONITH the secondary SOP!
Yes 0 0 1 0 Healthy. Healthy. Failover mechanism is disabled, clients won't be notified about activity to do the failover. Secondary SOP was either deactivated or has a problem.
No 0 0 1 1 Healthy. Healthy. Failover mechanism is disabled, clients won't be notified about activity to do the failover. Primary SOP doesn't STONITH the secondary SOP!
Yes 0 1 0 0 Not healthy. If mserver is running, it will also do a SMITH. Healthy. Activity was detected and clients will receive a notification for switch-over. activity which triggered STONITH to the primary SOP. This is the behavior when working in failover mode when the secondary SOP has detected a call.
No 0 1 0 1 Not healthy. If mserver is running, it will also do a SMITH. Healthy. It will not notify clients on activity detection, nor STONITH the primary SOP (although STONITH is enabled on the primary SOP). Primary SOP doens't STONITH the secondary SOP!
Yes 0 1 1 0 Not healthy. If mserver is running, it will also do a SMITH. Healthy. The secondary SOP's activity detection mechanism is disabled by the deactivate command. This is the case when a restore is done.
No 0 1 1 1 Not healthy. If mserver is running, it will also do a SMITH. Healthy. The secondary SOP's activity detection mechanism is disabled by the deactivate command. Primary SOP doens't STONITH the secondary SOP!
Yes 1 0 0 0 Not healthy. Healthy. Failover mechanism is enabled, clients will be notified from the moment activity is detected and STONITH will be enabled on the primary SOP. This is an intermediate state.
No 1 0 0 1 Not healthy. Healthy. Failover mechanism is disabled, clients won't be notified about activity. Primary SOP doesn't STONITH the secondary SOP!
Yes 1 0 1 0 Not healthy. Healthy. Failover mechanism is disabled, clients won't be notified about activity. This is done to prevent the failover from being triggered during recovery. This scenario can only occur in case there was a SMITH on the primary SOP, no activity was detected yet on the secondary SOP and the recovery started.
No 1 0 1 1 Not healthy. Healthy. Failover mechanism is disabled, clients will not be notified about activity to do the failover. Primary SOP doesn't STONITH the secondary SOP!
Yes 1 1 0 0 Not healthy. The secondary SOP has done a STONITH on the primary SOP. SMITH was also triggered. (To determine which happened first, see logging on primary SOP.) Healthy. Clients will be notified on activity detection and repeatedly STONITH the primary SOP. This scenario occurs when working in failover mode.
No 1 1 0 1 Not healthy. The secondary SOP has done a STONITH on the primary SOP. The secondary SOP is no longer requesting the primary SOP to go in standby. Primary SOP doesn't STONITH the secondary SOP!
Yes 1 1 1 0 Not healthy. The secondary SOP has done a STONITH on the primary SOP The secondary SOP is accepting calls but is not reporting activity anymore to the clients. The recovery is taking place.
No 1 1 1 1 Not healthy. The secondary SOP has done a STONITH on the primary SOP The secondary SOP is accepting calls but is not reporting activity anymore to the clients. Primary SOP doens't STONITH the secondary SOP!

  • set to "0" indicates that the secondary SOP is marked as active
    • On the primary SOP this indicates that the secondary SOP is requesting the primary to remain in standby and should reject any incoming calls. The net.Console availability probe will therefore be marked as not OK so that the reception callflow rejects the call.
    • On the secondary this indicates that the secondary SOP is currently actively requesting the primary to go in standby. This is done by setting the STONITH flag on the primary via a ssh command.

How to handle a probe indicating a critical issue

The table hereafter provides the probes and the actions to be taken.
Probe name Impact Cause Action
CPU Load Average CPU Load Level Server behaviour is unstable A process is taking too much CPU Contact Escaux to check CPU load. If the server is too unstable force the fail-over.
DB Consistency directory_entry Call are rejected by the server Directory table corruption Automatic fail-over should be triggered. Contact Escaux to repair the database
DB Consistency attribute_state no impact on the net.Console Internal table corruption If this comes with unstable server, fail-over can be forced. Contact Escaux to repair the database
Disk Space used space of disk partition /data The disk usage is too high If the space used reach 100%, net.Console can be impacted Contact Escaux to clean up the disk. If disk space used is 100% force the fail-over
Disk Space used space of disk partition / The disk usage is too high If the space used reach 100%, net.Console will be impacted Contact Escaux to clean up the disk. If disk space used is 100% force the fail-over
Ethernet traffic Server can be unstable A process is generating too much traffic on the network If the server is unstable force the fail-over. Contact Escaux to take network trace and identify the process.
Memory usage Server can slow down and get unstable A process is taking too much CPU and needs to be restarted If the server is unstable, force the fail-over otherwise contact Escaux to identify the exact process taking too much memory
Sip self test Asterisk Call rejected by the server Asterisk deadlock Fail-over should be triggered automatically. Contact Escaux to check asterisk
Web Server net.Console cannot be started. Already started net.Console are not impacted. Web server is down Contact Escaux to check the Web Server
net.Console availability Calls to the reception are rejected net.Console server or one of its dependencies is down The fail-over should be automatic. Contact Escaux to investigate the exact root cause

How to trouble shoot a net.Console PC application which does not start

  • If the net.Console application GUI is not displayed, on the PC rename the netconsole3 home folder "<User's Home folder>\.escaux\netconsole3" to "<User's Home folder>\.escaux\netconsole3.old". This is probably an issue related to the configuration files on the PC.
  • Check that the net.Console client could properly connect to the net.Console server. If not the buttons remain greyed out.
  • Check all the required elements of a reception desk:
    • the general queue defined in the net.Console X900 must be created in Asterisk (Asterisk console command "show queue GeneralQueueName")
    • the personal queue defined in the net.Console X900 must be created in Asterisk (Asterisk console command "show queue PersonalQueueName")
    • the personal extension defined in the net.Console X900 exists in the internal directory
    • the primary and secondary phone attached to the personal extension must be registered in asterisk (Asterisk console command "sip show peers like PhoneName"). Both IP address of the primary and secondary identity must be the same since this is actually the same phone.

If none of the client starts, this can be related to one of these processes:
  • Escaux Support only The Application management server must be properly started. All the bundles must be in the state 'Active'
00000549/sop1:~# mshell ps
START LEVEL 4
ID State Level Name
[ 0] [Active ] [ 0] System Bundle (2.0.1)
[ 1] [Active ] [ 3] Apache Felix File Install (2.0.0)
[ 2] [Active ] [ 2] Escaux Config (2.0.0)
[ 3] [Active ] [ 2] Apache Felix Shell Service (1.4.1)
[ 4] [Active ] [ 2] Apache Felix Remote Shell (1.0.4)
[ 5] [Active ] [ 2] OPS4J Pax Logging - API (1.4)
[ 6] [Active ] [ 2] OPS4J Pax Logging - Service (1.4)
[ 7] [Active ] [ 1] Apache Felix Configuration Admin Service (1.2.4)
[ 8] [Active ] [ 4] Escaux Timer (1.0.1)
[ 9] [Active ] [ 4] Escaux AsteriskAPI (1.4.0)
[ 10] [Active ] [ 4] Escaux Attributes (1.1.7)
[ 11] [Active ] [ 4] Escaux Shell (1.0.3)
[ 12] [Active ] [ 4] Escaux FSM (1.1.1)
[ 13] [Active ] [ 4] Escaux Remoting (1.2.1)
[ 14] [Active ] [ 4] Escaux netConsoleControllerSipServer (1.5.2)
[ 15] [Active ] [ 4] Escaux Asterisk (1.0.0)
[ 16] [Active ] [ 4] Escaux Management (1.0.3)
[ 17] [Active ] [ 4] Escaux Clustering (1.1.2)
[ 18] [Active ] [ 4] Escaux CallEntryRepository (1.4.0)
[ 19] [Active ] [ 4] Escaux CdrProducer (2.1.1)
[ 20] [Active ] [ 4] Escaux CallEntry (1.2.1)
[ 21] [Active ] [ 4] Escaux netConsoleFsm (1.8.1)
[ 22] [Active ] [ 4] Escaux Utils (3.1.0)
[ 23] [Active ] [ 4] Escaux netConsoleSipHealthProvider (1.3.2)
[ 24] [Active ] [ 4] Escaux netConsoleController (2.0.0)
[ 25] [Active ] [ 4] Escaux SipService (2.0.0)
[ 26] [Active ] [ 4] Escaux LoggingOSGI (1.1.0)
[ 27] [Active ] [ 4] Escaux AuthServiceSipServer (2.0.0)
[ 28] [Active ] [ 4] Escaux Db (2.0.3)
[ 29] [Active ] [ 4] Escaux UCModelAccess (5.0.2)
[ 30] [Active ] [ 4] Escaux ShellOSGI (1.2.0)
[ 31] [Active ] [ 4] Escaux AuthService (2.0.0)
[ 32] [Active ] [ 4] Escaux SopConfiguration (1.1.0)
[ 33] [Active ] [ 4] Escaux CdrCsv (1.0.3)
[ 34] [Active ] [ 4] Escaux ContactManager (1.2.0)
[ 35] [Active ] [ 4] Escaux Osgi (1.2.0)
[ 36] [Active ] [ 4] Escaux ContactManagerImpl (1.1.1)
[ 37] [Active ] [ 4] Escaux netConsoleControllerImpl (1.5.2)
[ 38] [Active ] [ 4] Escaux HttpXml (1.0.0)
[ 39] [Active ] [ 4] Asterisk-Java (1.0.0.M3)
[ 40] [Active ] [ 4] Escaux UCModel (4.1.0)
[ 41] [Active ] [ 4] Escaux netConsoleFSMSIPServer (1.4.0)
[ 42] [Active ] [ 4] Escaux netConsoleFSMImpl (1.6.2)
[ 43] [Active ] [ 4] Escaux AuthServiceImpl (2.0.1)
[ 44] [Active ] [ 4] Escaux PhoneControl (1.3.2)
[ 45] [Active ] [ 4] Escaux UCModelSync (1.2.0)
[ 46] [Active ] [ 4] Escaux ContactManagerSipServer (1.1.4)
00000549/sop1:~#
  • Asterisk must be started
00000549*CLI> sip show peers like SDO
Name/username              Host            Dyn Nat ACL Port     Status
SDO20038/SDO20038          10.113.105.162   D          2048     OK (38 ms)
SDO20037/SDO20037          10.113.105.162   D          2048     OK (38 ms)
SDO20036/SDO20036          10.113.105.168   D          2048     OK (38 ms)
SDO20035/SDO20035          10.113.105.168   D          2048     OK (38 ms)
SDO20034/SDO20034          10.113.105.167   D          2048     OK (50 ms)
SDO20033/SDO20033          10.113.105.167   D          2048     OK (42 ms)
SDO20032/SDO20032          10.113.105.166   D          2048     OK (40 ms)
SDO20031/SDO20031          10.113.105.166   D          2048     OK (40 ms)
SDO20030/SDO20030          10.113.105.155   D          2048     OK (39 ms)
SDO20029/SDO20029          10.113.105.155   D          2048     OK (51 ms)
SDO20028/SDO20028          10.113.105.165   D          2048     OK (42 ms)
SDO20027/SDO20027          10.113.105.165   D          2048     OK (19 ms)
SDO20026/SDO20026          10.113.105.163   D          2048     OK (39 ms)
SDO20025/SDO20025          10.113.105.163   D          2048     OK (19 ms)
SDO20022/SDO20022          10.113.105.164   D          2048     OK (41 ms)
SDO20002/SDO20002          (Unspecified)    D          0        UNKNOWN
SDO20001/SDO20001          (Unspecified)    D          0        UNKNOWN
SDO20003/SDO20003          (Unspecified)    D          0        UNKNOWN
SDO20004/SDO20004          (Unspecified)    D          0        UNKNOWN
SDO20007/SDO20007          (Unspecified)    D          0        UNKNOWN
SDO20008/SDO20008          (Unspecified)    D          0        UNKNOWN
SDO20013/SDO20013          (Unspecified)    D          0        UNKNOWN
SDO20014/SDO20014          (Unspecified)    D          0        UNKNOWN
SDO20009/SDO20009          10.113.105.160   D          2048     OK (41 ms)
SDO20023/SDO20023          10.113.104.169   D          2048     OK (38 ms)
SDO20012/SDO20012          10.113.105.160   D          2048     OK (41 ms)
SDO20011/SDO20011          10.113.104.139   D          2048     OK (38 ms)
SDO20010/SDO20010          10.113.104.140   D          2048     OK (40 ms)
SDO20006/SDO20006          10.113.104.140   D          2048     OK (40 ms)
SDO20005/SDO20005          10.113.104.139   D          2048     OK (38 ms)
SDO20024/SDO20024          10.113.104.169   D          2048     OK (38 ms)
SDO20015/SDO20015          10.136.0.141     D          2070     OK (50 ms)
SDO20016/SDO20016          10.136.0.141     D          2070     OK (50 ms)
SDO20017/SDO20017          10.136.0.92      D          2070     OK (51 ms)
SDO20018/SDO20018          10.136.0.92      D          2070     OK (48 ms)
SDO20019/SDO20019          10.113.105.161   D          2048     OK (40 ms)
SDO20020/SDO20020          10.113.105.161   D          2048     OK (40 ms)
SDO20021/SDO20021          10.113.105.164   D          2048     OK (41 ms)
38 sip peers [30 online , 8 offline]
  • Escaux Support only Mysql must be started
00000549/sop1:~# mysql -p -e "show processlist"
Enter password:
+-------+------+-----------------------------+-------+---------+------+-------+----------------+
| Id    | User | Host                        | db    | Command | Time | State | Info           |
+-------+------+-----------------------------+-------+---------+------+-------+----------------+
|     1 | root | localhost                   | cdrdb | Sleep   | 0    |       | NULL           |
|    12 | root | localhost                   | sop   | Sleep   | 0    |       | NULL           |
|    14 | root | localhost                   | cdrdb | Sleep   | 8    |       | NULL           |
|    19 | root | localhost                   | cdrdb | Sleep   | 0    |       | NULL           |
| 19979 | root | localhost.localdomain:41529 | cdrdb | Sleep   | 0    |       | NULL           |
| 19990 | root | localhost.localdomain:41623 | sop   | Sleep   | 3130 |       | NULL           |
| 24429 | root | localhost.localdomain:58512 | cdrdb | Sleep   | 0    |       | NULL           |
| 24785 | root | localhost.localdomain:52063 | sop   | Sleep   | 430  |       | NULL           |
| 24786 | root | localhost.localdomain:52064 | sop   | Sleep   | 1030 |       | NULL           |
| 24787 | root | localhost.localdomain:52066 | sop   | Sleep   | 6430 |       | NULL           |
| 24788 | root | localhost.localdomain:52067 | sop   | Sleep   | 2830 |       | NULL           |
| 24789 | root | localhost.localdomain:52069 | sop   | Sleep   | 7030 |       | NULL           |
| 24865 | root | localhost.localdomain:60312 | sop   | Sleep   | 130  |       | NULL           |
| 24910 | root | localhost.localdomain:49831 | sop   | Sleep   | 8530 |       | NULL           |
| 24911 | root | localhost.localdomain:49832 | sop   | Sleep   | 8830 |       | NULL           |
| 24912 | root | localhost.localdomain:49833 | sop   | Sleep   | 730  |       | NULL           |
| 24956 | root | localhost.localdomain:49398 | sop   | Sleep   | 3730 |       | NULL           |
| 24957 | root | localhost.localdomain:49399 | sop   | Sleep   | 6130 |       | NULL           |
| 24977 | root | localhost.localdomain:49554 | sop   | Sleep   | 7330 |       | NULL           |
| 24978 | root | localhost.localdomain:49555 | sop   | Sleep   | 5530 |       | NULL           |
| 24979 | root | localhost.localdomain:49556 | sop   | Sleep   | 3430 |       | NULL           |
| 24980 | root | localhost.localdomain:49557 | sop   | Sleep   | 7630 |       | NULL           |
| 25018 | root | localhost.localdomain:60921 | sop   | Sleep   | 1630 |       | NULL           |
| 25019 | root | localhost.localdomain:60922 | sop   | Sleep   | 1930 |       | NULL           |
| 25020 | root | localhost.localdomain:60923 | sop   | Sleep   | 5230 |       | NULL           |
| 25021 | root | localhost.localdomain:60924 | sop   | Sleep   | 2530 |       | NULL           |
| 25022 | root | localhost.localdomain:60925 | sop   | Sleep   | 4930 |       | NULL           |
| 25023 | root | localhost.localdomain:60926 | sop   | Sleep   | 7930 |       | NULL           |
| 25056 | root | localhost.localdomain:49800 | sop   | Sleep   | 4630 |       | NULL           |
| 25079 | root | localhost.localdomain:50024 | sop   | Sleep   | 2230 |       | NULL           |
| 25105 | root | localhost.localdomain:44410 | sop   | Sleep   | 1330 |       | NULL           |
| 25106 | root | localhost.localdomain:44411 | sop   | Sleep   | 5830 |       | NULL           |
| 25182 | root | localhost.localdomain:51470 | sop   | Sleep   | 6730 |       | NULL           |
| 25244 | root | localhost.localdomain:33646 | sop   | Sleep   | 8230 |       | NULL           |
| 25245 | root | localhost.localdomain:33647 | sop   | Sleep   | 4330 |       | NULL           |
| 25246 | root | localhost.localdomain:33648 | sop   | Sleep   | 4030 |       | NULL           |
| 36301 | root | localhost.localdomain:40600 | cdrdb | Sleep   | 0    |       | NULL           |
| 58033 | root | localhost                   | cdrdb | Sleep   | 3858 |       | NULL           |
+-------+------+-----------------------------+-------+---------+------+-------+----------------+

How to troubleshoot a net.Console which does not show incoming calls ( Escaux Support only )

The calls to the reception are going through the following layers:
  • asterisk
  • Call Management
  • net.Console server
  • net.Console client

Check asterisk

Check that the call arrives in asterisk.

The typical steps of the reception callflow are the following:
  • Check that the net.Console server is available via the action CheckResourceGroupAvailability. The net.Console server register itself via a SIP peer of the type NPA1
  • Check that the call is not rejected via the action 'Congestion'. This is done if the server is not seen as reachable
  • Check that the call is sent to one of the queue members. As soon as one of the member gets idle, the queue will send the call to this member

 
00000037*CLI> set verbose 9
Verbosity was 0 and is now 9
    -- Executing Macro("SIP/SOA20003-b770e648", "Action_CheckResourceGroupAvailability_1.01
    |*110||general reception for active-active support||||||||||NPA10001||||||||||*11011
    |*11012||||||||||||||||||||||||||||||") in new stack
    -- Executing Set("SIP/SOA20003-b770e648", "Resources=NPA10001|result=0|param_count=1"
    ) in new stack
    -- Executing Set("SIP/SOA20003-b770e648", "param=NPA10001") in new stack
    -- Executing GotoIf("SIP/SOA20003-b770e648", "0?loopexit") in new stack
    -- Executing Set("SIP/SOA20003-b770e648", "param_count=2.000000") in new stack
    -- Executing ChanIsAvail("SIP/SOA20003-b770e648", "SIP/NPA10001|sj") in new stack
    -- Executing NoOp("SIP/SOA20003-b770e648", "NPA10001 - 0 - 0") in new stack
    -- Executing Set("SIP/SOA20003-b770e648", "result=1") in new stack
    -- Executing Goto("SIP/SOA20003-b770e648", "loop") in new stack
    -- Goto (macro-Action_CheckResourceGroupAvailability_1.01,s,2)
    -- Executing Set("SIP/SOA20003-b770e648", "param=") in new stack
    -- Executing GotoIf("SIP/SOA20003-b770e648", "1?loopexit") in new stack
    -- Goto (macro-Action_CheckResourceGroupAvailability_1.01,s,500)
    -- Executing GotoIf("SIP/SOA20003-b770e648", "1?502") in new stack
    -- Goto (macro-Action_CheckResourceGroupAvailability_1.01,s,502)
    -- Executing Goto("SIP/SOA20003-b770e648", "NoRestrict|*11012|1") in new stack
    -- Goto (NoRestrict,*11012,1)
  == Channel 'SIP/SOA20003-b770e648' jumping out of macro 
  'Action_CheckResourceGroupAvailability_1.01'
    -- Executing Macro("SIP/SOA20003-b770e648", "Action_SetCallerID_1.00|*11012||General.Default
    ||||||||||General ESCAUX<0>|*1101202||||||||||||||||||||||||||||||||||||||||") in new stack
    -- Executing SetCallerID("SIP/SOA20003-b770e648", "General ESCAUX<0>") in new stack
    -- Executing Goto("SIP/SOA20003-b770e648", "NoRestrict|*1101202|1") in new stack
    -- Goto (NoRestrict,*1101202,1)
  == Channel 'SIP/SOA20003-b770e648' jumping out of macro 'Action_SetCallerID_1.00'
    -- Executing Macro("SIP/SOA20003-b770e648", "Action_Queue_1.03|*1101202|||||||||||
    |GeneralQueue|600|*110120203| 1|r|n|||||*110120211|*110120212|||||||||||||||||||||||
    |||||||")
     in new stack
    -- Executing SetVar("SIP/SOA20003-b770e648", "queue_name=GeneralQueue") in new stack
    -- Executing SetVar("SIP/SOA20003-b770e648", "timeout=600") in new stack
    -- Executing Set("SIP/SOA20003-b770e648", "timeout_extension=*110120203
    |nomember_extension=*110120211") in new stack
    -- Executing GotoIf("SIP/SOA20003-b770e648", "0?7:5") in new stack
    -- Goto (macro-Action_Queue_1.03,s,5)
    -- Executing GotoIf("SIP/SOA20003-b770e648", "0?20:6") in new stack
    -- Goto (macro-Action_Queue_1.03,s,6)
    -- Executing SetVar("SIP/SOA20003-b770e648", "GeneralQueue_counter=1") in new stack
    -- Executing SetVar("SIP/SOA20003-b770e648", "ring=r") in new stack
    -- Executing NoOp("SIP/SOA20003-b770e648", "") in new stack
    -- Executing GotoIf("SIP/SOA20003-b770e648", "0?10:11") in new stack
    -- Goto (macro-Action_Queue_1.03,s,11)
    -- Executing Queue("SIP/SOA20003-b770e648", "GeneralQueue|rt|||600") in new stack
    -- Called SIP/SDO10003
    -- SIP/SDO10003-08202b38 is ringing
    -- SIP/SDO10003-08202b38 is ringing
  == Parsing '/etc/asterisk/manager.conf': Found
  == Manager 'manager' logged on from 172.16.35.137
  == Manager 'manager' logged off from 172.16.35.137
    -- SIP/SDO10003-08202b38 is ringing
    -- SIP/SDO10003-08202b38 is ringing
00000037*CLI>

Check the call tracked by the Call Management module ( Escaux Support only )

Check that the Call Management properly tracks the calls.

The following command list all the calls tracked by the Call Management.
      00000037/sop1:~# mshell cc.calls
RECEIVED SDO10003 [SIP/SDO10003-08200218 -> SIP/SOA20003-b7714378] General Escaux 0 2000 
currentid(1285683753.54) External - id: 1285683753.54

DELIVERED SOA20003 [SIP/SOA20003-b7714378 -> SIP/SDO10003-08200218] General Escaux 0 2000 
currentid(1285683753.52) External - id: 1285683753.52

The call management will produce an entry for each channel in asterisk. The following information are respectively provided:
  • The call state as defined in the CSTA standard
  • The device attached to the channel
  • The channel name
  • The bridged channel name
  • The caller name
  • The caller number
  • The called number
  • The channel unique id used by Asterisk
  • The external/internal marker
  • The call ID which is the first unique ID provided by Asterisk for this channel. This one are available in the CDR database in mysql.

Further investigation can be done in the log file /var/log/mserver/local.log by searching for the call id, for example "1285683753.52". The following usefull information can be found:
  • asterisk event. For example, a call being bridged by Asterisk
2010-09-28 10:40:42,144 fsm            DEBUG [call_6] command: BridgeLinkEventCallCommand 
[uid1: 1285663238.1066, uid2: 1285663238.1068, ch1: SIP/172.16.35.167-b77031a8
, ch2:SIP/SDO10003-b7714138] [CallControl EP (36)]
  • call state transition. For example, the call seen as answered by the Call Management
2010-09-28 10:40:42,144 CallControl    DEBUG statechange: [RECEIVED] -> [ESTABLISHED] 
Ctx(key=1285663238.1068,ids=1285663238.1068,1285663238.1066,SDO10003
,SIP/SDO10003-b7714138,Ringing) [CallControl EP (36)]

Check the net.Console server ( Escaux Support only )

Check that the net.Console state has been properly updated.

The following command list all the net.Console sessions.
00000037/sop1:~# mshell ncctl.list
#controllers: 2
user:netconsole6 id:490688df-06e1-42f4-ac96-bfdaaa35b6ef state:LOGGEDOUTSTATE 
l1:null l2:null

user:netconsole4 id:c0f34d7c-173f-43be-8ca1-dcd7750b72f0 state:NEWINCOMINGSTATE 
l1:0(Technical test) l2:null

The following information are respectively provided:
  • The login used to start the net.Console session
  • The net.Console session ID. This one is different for every restart of the application
  • The state of the net.Console
  • The caller Id and caller name of the call in line 1
  • The caller Id and caller name of the call in line 2

Further investigation can be done in the log file /var/log/mserver/local.log by searching for the net.Console session id and the time of the call. The session ID is recreated for each restart of the net.Console client. One can find t
  • the command recieved from the client:
2010-09-28 11:13:29,240 nc.fsm.srv     DEBUG [47a0d88a-282e-4e42-a78e-2625723d4f94] 
received postCommand:DirectedParkCommand (350) [] [SipDispatcher requestExecutor-1]
  • the net.Console state transition
2010-09-28 11:13:29,242 fsm            DEBUG [nc_47a0d88a-282e-4e42-a78e-2625723d4f94_2] 
state (entry): DIRPARKINITSTATE [SipDispatcher requestExecutor-1]

Check the net.Console client

On the client side the only way to investigate an issue is to look at the local log file in <User's home folder>\.escaux\netconsole3. In the log file we can find:
  • State update containing all the information about what to diplay in the GUI. It is sent by the net.Console server for each net.Console state transition:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<NetConsoleFsmContext camp="false" dirPark="false" fsmId="f94d3a08-deb6-4d57-8572-78e3fd5bfcde" 
inCommand="true" lastIdleStatePaused="false" 
loggedout="false" seq="14" stateId="11" stateType="CONVERSATIONSTATE">
<availableCommands>
<NetConsoleCommandType type="RINGINGEVENTCOMMAND"/>
<NetConsoleCommandType type="TIMEOUTEVENTCOMMAND"/>
<NetConsoleCommandType type="PAUSECOMMAND"/>
<NetConsoleCommandType type="HANGUPCOMMAND"/>
<NetConsoleCommandType type="HANGUPEVENTCOMMAND"/>
<NetConsoleCommandType type="HOLDEVENTCOMMAND"/>
<NetConsoleCommandType type="COUPLECOMMAND"/>
<NetConsoleCommandType type="DIALCOMMAND"/>
<NetConsoleCommandType type="ABANDONEVENTCOMMAND"/>
<NetConsoleCommandType type="RESETCOMMAND"/>
<NetConsoleCommandType type="CAMPCOMMAND"/>
<NetConsoleCommandType type="DIRECTEDPARKCOMMAND"/>
<NetConsoleCommandType type="PARKCOMMAND"/>
<NetConsoleCommandType type="HOLDCOMMAND"/>
</availableCommands>
<Line1>
<CallEntry callId="1285766723.88" disposition="net console four " name="Polycom 330 " 
number="3360" ownerId="f94d3a08-deb6-4d57-8572-78e3fd5bfcde" ownerName="netconsole4">
<callnote>RU1QVFk=</callnote>
<history>
<CallEntryEvent nextTime="-1" sentTime="1285766756792" time="1285766754165">
<eventnote>QUNDRVBU</eventnote>
<property key="line" value="LINE1"/>
<property key="type" value="line"/>
<property key="state" value="CONVERSATION"/>
<property key="command" value="ACCEPTEVENTCOMMAND"/>
</CallEntryEvent>
<CallEntryEvent nextTime="1285766754165" sentTime="1285766756792" time="1285766753863">
<eventnote>VEFLRUJBQ0s=</eventnote>
<property key="line" value="LINE1"/>
<property key="type" value="line"/>
<property key="state" value="INRING"/>
<property key="command" value="TAKEBACKCOMMAND"/>
</CallEntryEvent>
<CallEntryEvent nextTime="1285766753863" sentTime="1285766756792" time="1285766743053">
<eventnote>RlJFRV9GT1JNQVQ6NjcwNSAobG9uZyBjb252ZXJzYXRpb24gcm9ib3QgKQ==</eventnote>
<property key="available_commands" value="TAKEBACKCOMMAND,COUPLECOMMAND"/>
<property key="type" value="supervision"/>
<property key="number" value="6705"/>
<property key="state" value="DIR_PARKED"/>
<property key="command" value="DIRECTEDPARKCOMMAND"/>
</CallEntryEvent>
<CallEntryEvent nextTime="1285766743053" sentTime="1285766756792" time="1285766725395">
<eventnote>QUNDRVBU</eventnote>
<property key="line" value="LINE1"/>
<property key="type" value="line"/>
<property key="state" value="CONVERSATION"/>
<property key="command" value="ACCEPTEVENTCOMMAND"/>
</CallEntryEvent>
<CallEntryEvent nextTime="1285766725395" sentTime="1285766756792" time="1285766723458">
<eventnote>UklOR0lORw==</eventnote>
<property key="line" value="LINE1"/>
<property key="type" value="line"/>
<property key="state" value="INRING"/>
<property key="command" value="RINGINGEVENTCOMMAND"/>
</CallEntryEvent>
<CallEntryEvent nextTime="1285766723458" sentTime="1285766756792" time="1285766723376">
<eventnote>UVVFVUVE</eventnote>
<property key="queue" value="PERSONAL"/>
<property key="available_commands" value="TAKEBACKCOMMAND,COUPLECOMMAND"/>
<property key="type" value="queue"/>
<property key="command" value="QUEUEDEVENTCOMMAND"/>
</CallEntryEvent>
</history>
</CallEntry>
</Line1>
</NetConsoleFsmContext>

where we can find:
  • availableCommands: the button which must be enabled
  • line 1 the description of the call to be displayed in line 1
  • line 2 the description of the call to be displayed in line 2
  • each CallEntry corresponds to a call
  • history provides the sequence of events corresponding for this call

How to force a Snom phone to get its configuration

  • Get the IP address of the Snom phone. This one can be obtain by clicking on the "?" button of the Snom phone
  • Check the VlanId configured on the phone. This can be done directly on the Snom phone by clicking on the "Snom" button and then navigate up to the VLAN menu. Alternatively this can be checked on the Snom's web interface in the menu Setup > Advanced (+admin password) > QoS/Security
  • Get the MAC address. This one is available in the SNom's web interface (Status > System Information)
  • Download the snom phone configuration file. This can be done via a tftp client (tftp)
  • Go to: Snom's Web > Setup > Advanced > Update and put the file perviously downloaded then click on "Load"

How to force a Snom phone to upgrade

  • Check the VlanId configured on the phone. This can be done directly on the Snom phone by clicking on the "Snom" button and then navigate up to the VLAN menu. Alternatively this can be checked on the Snom's web interface in the menu Setup > Advanced (+admin password) > QoS/Security
  • If the vlan Id is not correct, go on the phone keypads, go Settings > VLAN settings and press 'ID'. Enter the Vlan ID and press the "V" button. Then reboot the phone (Settings > Configuration > Reboot)

If the phone still does not get upgrade after reboot do the following:
  • On the Snom's Web interface. Check the firmware version in Status > System Information > Firmware - version

If the version is 6 (for example 6.5.30). Force the update to the firmware 7
  • Go to: 'Snom Web > Setup > Software Update' and enter the following URL "http://<SOP's IP>/snom/snom320-from6to7-7.1.30-bf.bin" and then press 'Load'. Check that the phone restart. If the snom ask for confirmation, press "v".

After upgrade, the Vlan Id is lost and might need to be re-entered once more.

If the Snom phone does not upgrade to 7.3.30, force the upgrade via the following URL "http://<SOP's IP>/snom/snom320-7.3.30-SIP-f.bin"

After upgrade, the Vlan Id is lost and might need to be re-entered once more.

How to reset all the processes required for the net.Console

  • Restart mysql: /etc/init.d/msysql restart Escaux Support only

  • Restart Asterisk

DONE Navigate to: shell> Telephony > diagnostic> asterisk console > restart now

  • Restart the Application Management Server
DONE Navigate to: shell > Subsystem > Managament Server > Start/Stop
  • Restart the net.Console application on the Windows PC

Quick check list of the health of a net.Console

  • On the BMS check the status of the probe (Go to: smp014.escaux.com > Service Details > 549.sop, 549.svc, 551.sop, 551.svc)
  • Primary and Secondary: Check Application Management: mshell ps must show all bundles active Escaux Support only
  • Primary and Secondary: Number of mysql connection must be lower than 65 (mysql -p -e "show processlist") Escaux Support only
  • Primary and Secondary: asterisk is accessible (asterisk -rx "show channels", number of channels and calls are consistent)
  • Primary and Secondary: Snom phones are registered (asterisk -rx "sip show peers like SDO")
  • Primary and Secondary: Queue state and net.Console state are consistent (asterisk -rx "show queues" versus "mshell ncctl.list", logged in/logged out/paused/unpaused are consistent) Escaux Support only
  • Both active-active config are consistent ("mshell ncctl.list" on both active-active SOP gives the same output) Escaux Support only
  • Primary and secondary: more /opt/mserver/SMITH and more /opt/mserver/STONITH contains "0" Escaux Support only
  • Primary and secondary: net.Console availability probe is REACHABLE (asterisk -rx "sip show peers like NPA")

How to handle an issue reported by a receptionist while handling a call ( Escaux Support only )

Ask to reproduce the issue, to click on the bug report button, to select the call in the drop down, describe the issue and click on Send.

The bug report contains the following elements:
            **** Note: Failed to transfer the call
            **** Line 1 ****
            [id=1285601325.350104][number=SDR50002]
            [name=SDR50002][disposition=netconsole4 ]
            [ownerid=a462d035-4fdc-4fe4-9469-3005ed844cdf][ownername=netconsole4]
            [note=EMPTY]
              [history:CallEntryEvent on 1285601335889: HOLD, {line=LINE1, type=line, 
              state=ONHOLD, command=HOLDEVENTCOMMAND}]
              [history:CallEntryEvent on 1285601333284: ACCEPT, {line=LINE1, type=line, 
              state=CONVERSATION, command=ACCEPTEVENTCOMMAND}]
              [history:CallEntryEvent on 1285601333050: TAKEBACK, {line=LINE1, type=line, 
              state=INRING, command=TAKEBACKCOMMAND}]
              [history:CallEntryEvent on 1285601325763: QUEUED, {queue=PERSONAL, 
              available_commands=TAKEBACKCOMMAND,COUPLECOMMAND, type=queue, 
              command=QUEUEDEVENTCOMMAND}]
            
            **** Line 2 ****
            [id=1285601337.350153][number=3360]
            [name=Polycom 330 ][disposition=]
            [ownerid=a462d035-4fdc-4fe4-9469-3005ed844cdf][ownername=netconsole4]
            [note=EMPTY]
              [history:CallEntryEvent on 1285601354111: ANSWER, {line=LINE2, type=line, 
              state=CONVERSATION, command=CONVERSATIONEVENTCOMMAND}]
              [history:CallEntryEvent on 1285601338844: DIAL, {line=LINE2, type=line, 
              state=OUTRING, command=DIALCOMMAND}]            
        

The bug report is available in the Application Management Server (/var/log/mserver/local.log) and provides a snapshot of the last known state. In it we find:
  • Note: The text entered by the receptionist
  • Line 1 (resp. Line 2): All the information about the state of the line 1 (resp. line2)
    • id: Call ID of the call in line 1 as identified by the Call Management module
    • number: Connected line number
    • name: Connected line name
    • ownerid: net.Console session Id of the net.Console module
    • owner name: name of the receptionist as defined in the first name and last name of the personal extension
    • note: park note entered by the receptionist
    • history: history of the events which occured for this call

How to investigate an issue to access the CDR files ( Escaux Support only )

  • Check that CDR exists in /data/mserver
  • Check that there are accessible through Apache:
00000037/sop1:/# cd /data/tmp
00000037/sop1:/data/tmp# wget "http://127.0.0.1/api/cdr/csv/getcdr?date=20101002"
--14:57:22--  http://127.0.0.1/api/cdr/csv/getcdr?date=20101002
           => `getcdr?date=20101002'
Connecting to 127.0.0.1:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 92,460 [text/plain]

100%[================================================>] 92,460        --.--K/s

14:57:22 (141.08 MB/s) - `getcdr?date=20101002' saved [92460/92460]

  • Check Apache config
00000037/sop1:/etc/apache2/conf.d# cat cdr.conf
<IfModule mod_proxy.c>

<Proxy *>
Order deny,allow
Allow from all
</Proxy>

ProxyPass /api/cdr/ http://127.0.0.1:4445/api/cdr/
ProxyPassReverse /api/cdr/ http://127.0.0.1:4445/api/cdr/

</IfModule>
  • Check /var/log/mserver/local.log for any exception

How to handle issue about incorrect phone status ( Escaux Support only )

Check phone status on net.Console SOP

Phone status is provided by the Call Management module. The following status are provided:
  • Idle: no calls on the phone
  • Busy internal: phone busy and the call is an internal call
  • Busy external: phone is busy and the call is an external call

Beside this the following registration statuses are provided:
  • Registered
  • Unregistered

The phone status is an aggregated view of the calls tracked by the Call Management. In a cluster, the Call Management is installed on each SOP and takes care of the broadcast of the phone statuses within the cluster.

In order to check the status, go on the net.Console SOP and do the following command ( Escaux Support only ):
00000037/sop1:~# mshell ps.list
---------------------
Size:88
SDR40001:Idle(Unknown) -:0 Unregistered (Remote)
SDR50002:Idle(Unknown) 172.16.34.117:5060 Registered (Remote)
SDO20001:Idle(Unknown) 172.16.34.69:2048 Registered (Local)
SDO20002:Idle(Unknown) 172.16.34.69:2048 Registered (Local)
SDO20003:Idle(Unknown) 172.16.34.89:2048 Registered (Local)
SDO10003:Idle(Unknown) 172.16.34.127:2048 Registered (Local)

The following information can respectively be found:
  • The name of the phone
  • The phone state: Idle or Busy
  • The internal/external call marker:
    • Internal: internal call
    • External: external call
    • Unknown: undetermined internal/external marker
  • IP address of the phone
  • Registration state: Registered or Unregistered
  • Source of the phone status:
    • Local: concern a phone installed on the local SOP
    • Remote: concern a phone installed on another SOP of the cluster
    • Aggregated: concern a phone installed on several SOPs

Check phone status on remote SOP ( Escaux Support only )

In the same way the phone status of a remote phone can be checked directly on the remote SOP and compared with the local status in order to check that there are properly synchronized.

Check phone status in the net.Console client

If the phone status is correct on the side of the net.Console SOP, this is probably an issue at the level of the net.Console client.

Get the net.Console client log file. The following information can be found.

MESSAGE sip:f94d3a08-deb6-4d57-8572-78e3fd5bfcde@172.16.34.250:3040;transport=tcp SIP/2.0
Via: SIP/2.0/TCP 0.0.0.0:3040;branch=z9hG4bK0de66ce5e5351f4c8dac854ee7d4c52f3537
;received=172.16.35.137;rport=51713
Max-Forwards: 70
CSeq: 39 MESSAGE
From: "NetConsoleController f94d3a08-deb6-4d57-8572-78e3fd5bfcde" 
<sip:NetConsoleController.f94d3a08-deb6-4d57-8572-78e3fd5bfcde@172.16.35.137:3040
;transport=tcp>;tag=9d0b2e41
To: "f94d3a08-deb6-4d57-8572-78e3fd5bfcde" 
<sip:f94d3a08-deb6-4d57-8572-78e3fd5bfcde@172.16.34.250:3040;transport=tcp>;tag=1571429497
Call-ID: 37116508d0f630ed373b8fffa0b84d99@0.0.0.0
EscauxMessageDescription: PhoneStatusSet
Content-Type: application/propertiesxml
Content-Length: 483

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>phonestatus update</comment>
<entry key="SDR60001">Idle|Unknown</entry>
<entry key="SDX20003">Idle|Unknown</entry>
<entry key="SDR10002">Idle|Unknown</entry>
<entry key="SDG30003">Idle|Unknown</entry>
<entry key="SDR50001">Idle|Unknown</entry>
<entry key="SDR70001">Idle|Unknown</entry>
<entry key="">Idle|Unknown</entry>
</properties>

If the phone status are received by the server. This must be an bug in the rendering of the phone status which depends on the way the search is done in the directory.

How to identify the reason of a fail-over (active-active) ( Escaux Support only )

Check on the primary SOP when the net.Console has been deactivated ( Escaux Support only )

In the Application Management Server logging /var/log/mserver/local.log, look for the first occurrence of the SOP deactivation:
2010-09-30 13:10:37,796 nc.health      WARN  Aggregated netConsoleControllerService 
health is not OK [SipDispatcher requestExecutor-441]

Just above you should find the reason of the fail-over. For example,
com.escaux.ucmodel.access.ModelException: Failed read probe:mysql.attribute_state
        at com.escaux.ucmodel.access.impl.ModelServiceImpl.getProbe(ModelServiceImpl.java:367)
        at com.escaux.netconsole.health.impl
        .NetConsoleControllerServiceOptionsHandlerSipListener
        .isHealthy(NetConsoleControllerServiceOptionsHandlerSipListener.java:
72)
        at com.escaux.netconsole.health.impl
        .NetConsoleControllerServiceOptionsHandlerSipListener
        .handleOptions(NetConsoleControllerServiceOptionsHandlerSipListener.j
ava:38)
        at com.escaux.sip.AbstractSessionListener.processRequest
        (AbstractSessionListener.java:163)
        at com.escaux.sip.impl.SipDispatcher$2.run(SipDispatcher.java:141)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: com.mysql.jdbc.CommunicationsException: Communications link failure 
due to underlying exception:

Check on the secondary SOP when calls started to be accepted in the secondary SOP

You can have a look in asterisk CDRs files: Shell > Diagnostic > Telephony > view asterisk call data record

Check that the secondary did not force the deactivation of the primary SOP ( Escaux Support only )

On the secondary SOP, look in the Application Management Server logging (/var/log/mserver/local.log) for the first occurrence of
2010-09-30 17:30:49,835 nc.ctl.impl    DEBUG Setting activity detected to true 
[CallControl EP (65)]

This indicates that a call for the reception has been detected on the secondary SOP. The call which triggered it can be found a bit before:

2010-09-30 17:30:49,811 CallControl    DEBUG [AED][Event: NewChannel]
[channel=SIP/10.126.201.4-0818b5e8,uid=1285860649.5535,channelstatedesc=Ring
,calleridname=Volet 1
55,calleridnum=96568,calledid=91313] [CallControl EP (65)]
2010-09-30 17:30:49,811 CallControl    DEBUG Channel cache 1:[SIP/10.126.201.4-0818b5e8] 
[CallControl EP (65)]
2010-09-30 17:30:49,811 CallControl    INFO  New incoming call from event  
[CallControl EP (65)]
2010-09-30 17:30:49,811 fsm.cc         DEBUG [call_11] Creating a new FSM 
(com.escaux.fsm.Fsm@6f558a) [CallControl EP (65)]
2010-09-30 17:30:49,811 fsm.cc         DEBUG [call_11] state (entry): NULL 
[CallControl EP (65)]
2010-09-30 17:30:49,811 fsm.cc         DEBUG [call_11] command: DetectedCallEventCallCommand
 [call=1285860649.5535;ch=SIP/10.126.201.4-0818b5e8;phone=10.126.201.4] [C
allControl EP (65)]
2010-09-30 17:30:49,811 CallControl    DEBUG changing unique id: (null)->(1285860649.5535)
 [CallControl EP (65)]
2010-09-30 17:30:49,811 fsm.cc         DEBUG [call_11] state (entry): DETECTED
 [CallControl EP (65)]
2010-09-30 17:30:49,811 CallControl    DEBUG statechange: [NULL] -> [DETECTED]
 Ctx(key=1285860649.5535,ids=1285860649.5535,null,10.126.201.4,SIP/10.126.201.4-0818b5e8
,Ring) [CallControl EP (65)]
2010-09-30 17:30:49,811 CallControl    DEBUG Registering call 1285860649.5535
 state=DETECTED [CallControl EP (65)]
2010-09-30 17:30:49,812 CallControl    DEBUG [AED][Event: NewExten]
[appdata=ReceptionistQueue=COLRPO1313,uid=1285860649.5535,
channel=SIP/10.126.201.4-0818b5e8,extension=91313] [CallControl EP (65)]
2010-09-30 17:30:49,813 CallControl    DEBUG [AED][Event: NewExten]
[appdata=varname_1=ReceptionistQueue,uid=1285860649.5535
,channel=SIP/10.126.201.4-0818b5e8,extensio
n=91313] [CallControl EP (65)]
2010-09-30 17:30:49,814 CallControl    DEBUG [AED][Event: NewExten]
[appdata=LastUserSop1=00000549,uid=1285860649.5535
,channel=SIP/10.126.201.4-0818b5e8,extension=9131
3] [CallControl EP (65)]
2010-09-30 17:30:49,815 CallControl    DEBUG [AED][Event: NewExten]
[appdata=LastUserSop2=00000551,uid=1285860649.5535
,channel=SIP/10.126.201.4-0818b5e8,extension=9131
3]

Check for the rejected call on the primary SOP ( Escaux Support only )

If the call has been sent to the secondary SOP, this is either because the primary SOP was not reachable or because the primary SOP rejected the call.

To find the call being rejected, check the Application Management Server log file on the primary SOP at the same time than the call found on the secondary.

In this case the call has been explicitly rejected by the callflow because the backend was not available.
2010-09-30 17:30:49,782 CallControl    DEBUG [AED][Event: NewExten]
[appdata=Action_CheckResourceGroupAvailability_1.01|*531|
|Colruyt-Receptionist for active-active|||
|||||||NPA10001||||||||||*53111|*53112||||||||||||||||||||||||||||||,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=*531] [CallControl EP (77)]
2010-09-30 17:30:49,782 CallControl    DEBUG [AED][Event: NewExten]
[appdata=Resources=NPA10001|result=0|param_count=1,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,782 CallControl    DEBUG [AED][Event: NewExten]
[appdata=param=NPA10001,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallCo
ntrol EP (77)]
2010-09-30 17:30:49,783 CallControl    DEBUG [AED][Event: NewExten]
[appdata=0?loopexit,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=s]
 [CallControl EP (77)]
2010-09-30 17:30:49,783 CallControl    DEBUG [AED][Event: NewExten]
[appdata=param_count=2.000000,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,783 CallControl    DEBUG [AED][Event: NewExten]
[appdata=SIP/NPA10001|sj,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,783 CallControl    DEBUG [AED][Event: NewExten]
[appdata=next,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,783 CallControl    DEBUG [AED][Event: NewExten]
[appdata=NPA10001 - 5 - 0,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,784 CallControl    DEBUG [AED][Event: NewExten]
[appdata=result=0,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=s] 
[CallControlEP (77)]
2010-09-30 17:30:49,784 CallControl    DEBUG [AED][Event: NewExten][appdata=loop
,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=s] 
[CallControl EP (77)]
2010-09-30 17:30:49,784 CallControl    DEBUG [AED][Event: NewExten]
[appdata=param=,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,784 CallControl    DEBUG [AED][Event: NewExten]
[appdata=1?loopexit,uid=1285860649.13719
,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,784 CallControl    DEBUG [AED][Event: NewExten]
[appdata=0?502,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=s] 
[CallControl EP(77)]
2010-09-30 17:30:49,785 CallControl    DEBUG [AED][Event: NewExten][appdata=NoRestrict|*53111
|1,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=s] [CallControl EP (77)]
2010-09-30 17:30:49,785 CallControl    DEBUG [AED][Event: NewExten]
[appdata=Action_Congestion_1.01|*53111||Reject the call||||||||||No|0|||||||||||||||||||||||||
|||||||||||||||,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=*53111] 
[CallControl EP (77)]
2010-09-30 17:30:49,785 CallControl    DEBUG [AED][Event: NewExten]
[appdata=0?2:3,uid=1285860649.13719,channel=SIP/10.126.201.4-b78e3448,extension=s] 
[CallControl EP(77)]
2010-09-30 17:30:49,785 phonestatus    DEBUG Incoming remote phonestate: SDS60052:Ring(Internal)
 10.126.200.88:5061 Registered (Remote) from line SDS60052,10.126.200.88,5061,Ring|Internal
 ,Registered from 10.126.201.4 [11632138@qtp-2175170-1 - Acceptor0 LocalConnector@0.0.0.0:1 
 - /api/phonestatus/push]
2010-09-30 17:30:49,785 phonestatus    DEBUG New registration: SDS60052:Registered 1 
[11632138@qtp-2175170-1 - Acceptor0 LocalConnector@0.0.0.0:1 - /api/phonestatus/push]

How to restore the traffic on the primary SOP after a fail-over to the secondary (active-active)

ALERT! If the net.Console SOP is also used for simple telephony, check that the window maintenance has been notified. This operation requires a maintenance window in order to restart the connected net.Console application if any.

Make sure that all the required process are healthy on the primary SOP:
  • Step 1: restart mysql:/etc/init.d/mysql restart ( Escaux Support only )
  • Step 2: restart asterisk
DONE Navigate to: shell > Diagnostic > Telephony > Restart now
  • Step 3: check that asterisk reload is complete by checking, for example the SIP peers
DONE Navigate to: shell > Diagnostic > Telephony > Asterisk console > sip show peers like SDO
  • Step 4: check the queues in asterisk are all in paused state so that calls are not send to the receptionist
DONE Navigate to: shell > Diagnostic > Telephony > Asterisk console > show queues
00000037*CLI> show queues
GeneralQueue has 0 calls (max unlimited) in 'rrmemory' strategy (0s holdtime), W:0, C:0, A:0, 
SL:0.0% within 0s
   Members: >
      SIP/SDO10003 (dynamic) (Not in use) has taken no calls yet
   No Callers
RcptQueue4   has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime), W:0, C:0, A:0, 
SL:0.0% within 0s
   Members: >
      SIP/SDO10003 (dynamic) (Not in use) has taken no calls yet
   No Callers
  • Step 5: if queue is not in paused state remove the member from the queue
00000037*CLI> remove queue member SIP/SDO10003 from GeneralQueue
Removed interface 'SIP/SDO10003' from queue 'GeneralQueue'
00000037*CLI>
  • Step 6: restart the Application Management Server
DONE Navigate to: shell > Subsystem > Management Server > Start/Stop
  • Step 7: check that all net.Console are reconnected and net.Console state are consistent: same amount of sessions are listed on both sops. And one SOP and only one SOP the state is PASSIVE.( Escaux Support only )
  • Step 8: make sure that the secondary SOP does not force the primary SOP to block the call (STONITH procedure). The calls will still be accepted on the secondary SOP:
DONE Navigate to: Secondary SOP > Shell > Subsystems > Management Server > Deactivate netConsole controler service
  • Step 9: make sure that the primary SOP stops blocking incoming calls
DONE Navigate to: Primary SOP > Shell > Subsystems > Management Server > Activate netConsole controler service
  • Step 10: inform customer that all receptionists must initiate the recovery procedure:
    • take back all the calls from the private queue, ask the customer to call back and terminate the call
    • when the private queue is empty, terminate the net.Console client and restart. After restart the net.Console must be connected to the primary SOP (see Window title bar) and calls should start coming in again.
  • Step 11: reactivate the fail-over on the secondary SOP:
DONE Navigate to: Secondary SOP > Shell > Subsystems > Management Server > Activate netConsole controler service
  • Step 12: check the status of the secondary SOP, it must be active:
DONE Navigate to: Secondary SOP > Shell > Subsystems > Management Server > Show status netConsole controler service
  • Step 13: check that all the receptionists have switched back to the primary SOP. Check again the consistency of the net.Console sessions via mshell ncctl.list on each SOP. Each session must be exists on the primary and secondary SOP. On the secondary SOP, the session should be in PASSIVESTATE ( Escaux Support only )

How to force a failover from the primary SOP to the secondary SOP (active-active)

If an issue occurs on the primary SOP which need to be neutralised but did not cause the fail-over, the following procedure can be used to block the traffic on the primary SOP and to force the fail-over

Deactivate the primary SOP

DONE Navigate to: Primary SOP > Shell > Subsystems > Management Server > Deactivate netConsole controler service

Force the processes to stop ( Escaux Support only )

ALERT! This should be done if calls are still being send to the primary SOP even though the net.Console service has been deactivated.

  • Disable asterisk in safetynet. Edit the following file and put a "#" in front of /bin/bash
00000037/sop1:/opt/mserver# cat /etc/safetynet/asterisk.conf
job Asterisk   {
        /usr/sbin/asterisk;
        #/bin/bash -c '/etc/init.d/asterisk start'
}
  • Stop asterisk via the console (asterisk -r > stop now)
  • If this does not work use kill -9
  • Stop mserver: /etc/init.d/mserver stop
Copyright © Escaux SA