Administrator Guide: Active-Active redundancy

Overview

The Active-Active redundancy offers SIP devices who support the feature the possibility to define one of the SOPs in the cluster as a secondary SIP server. The secondary server acts then as a "hot swappable" standby.

The primary SOP and secondary SOP does not need to contain the exact same list of devices. This enables n+1 redundancy instead of 2xn redundancy as offered by the High Availability module. A typical configuration would be to add an extra SOP in the cluster which will be the backup SOP for all the SIP devices of the cluster.

Nevertheless, note that the behavior of the Active-Active redundancy is dependent of the SIP devices. Please check the exact behavior of each device which will be used.

The following table describes the behavior of the various supported phones in case of failover:

Phone type Incoming call Outbound call
Polycom Phone registered on both SOPs and always ready to route calls to the phone In case of SOP1 SIP port unavailability, the call is sent through SOP2 after the SIP transaction timeout
UnidataWireless7700 & 7800 Phone registered on both SOPs and always ready to route calls to the phone In case of SOP1 SIP port unavailability, the call is sent through SOP2 after the SIP transaction timeout
Eyebeam Softphone Phone registered on both SOPs and always ready to route calls to the phone Once the phone registration refresh fails, you need to switch manually to identity2 in the Eyebeam settings
Aastra Phone registered on first SOP and only registering on second SOP when the first one is unavailable (with downtime) Once the phone registration refresh fails, the switch to the second SOP is done automatically (with downtime)

The failover and recovery are automatic when the SIP port of SOP1 is unavailable or when this one is back. There is no possibility to put SOP1 in an administrative mode in order to force sending the traffic to SOP2. This can nevertheless be done via a switch port shutdown.

Unlike the Active-Standby SOPs, the modules can be installed independently of each other.

Requirements

ALERT! Incoming external calls should always arrive on the primary SOP of the extensions. Especially regarding IMS no load balancing should be done between primary SOP and secondary SOP. Calls can be sent to the secondary SOP in case the primary SOP is down.

Modules:
  • The Phone Support modules and their version depend of the resources used. Check the dependencies for each them.
  • Unified Communication Model module v1.8.3 or higher (optional: only if you want to have the dynamic profile parameters and status synchronize between the primary and the secondary SOP)

Resources:
  • Aastra:
    • 6730i phone v1.02 or higher
    • 6731i phone v1.02 or higher
    • 6737i phone v1.11.0 or higher
    • 6739i phone v1.02 or higher
    • 6751i phone v1.02 or higher
    • 6753i phone v1.02 or higher
    • 6755i phone v1.02 or higher
    • 6757i phone v1.02 or higher
    • Aastra Virtual Phone phone v1.02 or higher
  • Eyebeam:
    • EyebeamAudio phone v1.19 or higher
  • Polycom
    • IP330 phone v1.0 or higher
    • IP331 phone v3.02 or higher
    • IP335 phone v3.02 or higher
    • IP450 phone v3.02 or higher
    • IP500 phone v2.20 or higher
    • IP550 phone v1.0 or higher
    • IP560 phone v1.0 or higher
    • IP650 phone v1.0 or higher
    • IP670 phone v1.0 or higher
    • IP6000 phone v1.0 or higher
    • IP7000 phone v4.10 or higher
    • VVX500 phone v4.13 or higher
    • Polycom Virtual Phone phone v1.0 or higher
  • Snom
    • SNOM300 phone v3.1 or higher
    • SNOM320 phone v3.1 or higher
    • SNOM360 phone v3.1 or higher
    • SNOM370 phone v3.1 or higher
    • SNOM821 phone v3.8 or higher
    • SNOM870 phone v3.8 or higher
  • Unidata
    • Wireless7700 phone v1.1 or higher

Actions:
  • StartDynamicApplication v5.8 or higher

Service enabling

This section describes the operations to be done in order to have your cluster ready for the Active-Active

Activation of the Active-Active intra-cluster routing

Active-Active mode for incoming calls relies on the StartDynamicApplication action to route the call to the Secondary SOP in case of failure of the Primary one and always route to the Primary SOP when it is up and running.

DONE Navigate to: SMP > Communication Studio > Callflow Assignment

Make sure that each status of the user's profiles uses the required version of STARTDYNAMICAPPLICATION in the column application selector.

SIP qualify activation

In order for a SOP to know that a remote SOP or a SIP device is unreachable, the SIP qualify must be enable. In order to simplify the administration, it is better to control the SIP qualify through the asterisk-1.2 module.

Step 1: Activate the SIP qualify in the module

For each SOP do the following.

DONE Navigate to:  SMP > Advanced > Modules installation

Search for asterisk-1.2x and click on Edit. Verify the following parameter:
  • Qualify SIP devices registration: A typical value is 16

Step 2: Set the default SIP qualify setting in the MeshSIPTrunk

Some version of the MeshSIPTrunk offers the possibility to overwrite the default SIP qualify setting define in the asterisk-1.2 module.

DONE Navigate to:  SMP > Resources > Interfaces

Check for each MeshSIPTrunk the following parameter:
  • Qualify: default

Step 3: Set the default SIP qualify setting on the IP phones

Some IP Phone resources offer the possibility to overwrite the default SIP qualify setting define in the asterisk-1.2 module.

DONE Navigate to:  SMP > Resources > IP Phones

Check for each IP Phone the following parameter:
  • Qualify SIP devices registration: Default (Keep the default server setting)

This can be done via the bulk admin.

In order to apply all the changes, an apply-cluster change will be needed.

Activation of the dynamic profile synchronization

Install the Unified Communication Model and set the following parameter:
  • Profile synchronization: yes

ALERT! Take care that it is not recommended to activate this feature if you have more than 5000 extensions.

Activation of the dynamic profile synchronization with SOP API 4.5.0 and Cluster & Active-Active Support 1.7.0

This is now recommended instead of the method based on Unified Communication Model.

Since the SOP API 4.5.0, every change to a dynamic profile that is defined on two SOPs will be directly propagated to both SOP1 and SOP2. This mechanism can fail if the other SOP is not available, therefore it is recommended to also install Cluster & Active-Active Support 1.7.0 and activate the dynamic profile synchronization. This will ensure that profile parameters are always synchronized, even after a crash of one of the SOPs.

The SOP API 4.5.0 will also propagate reconfigure requests. That means that if a user updates his speeddials using Escaux Connect and then reboot his Active-Active phone, the configuration file will be updated on both SOPs directly. If the second SOP was not available at the time of the reconfiguration of the phone, Cluster & Active-Active Support 1.7.0 will regenerate the configuration when the SOP comes back online.

Service delivery

How to configure an extension in Active-Active?

Step 1: Configure secondary SOP in the primary and secondary phone

DONE Navigate to: SMP > Resource > IP Phones

Click on Add or Edit the phone if it already exists

Change if needed the following parameters:
  • SOP1: primary SOP of the phone
  • SOP2: secondary SOP of the phone

This will enable the phone to register on both SOP1 and SOP2.

ALERT! It is possible that not both the primary and the secondary phone support the Active-Active. In that case, in case of failure, only the phone supporting the Active-Active will ring and will be usable for outgoing calls.

Step 2: Configure secondary SOP in internal directory

DONE Navigate to: SMP > Internal Directory

Click on Add or Edit the extension if it already exists

Change if needed the following parameters:
  • SOP1: primary SOP of the extension
  • SOP2: secondary SOP of the extension

This will enable to route an incoming call to the secondary SOP if this one fails

ALERT! Important remarks
  • The SOP1 and SOP2 configured in the directory and on the phone must be the same. If this is not the case the routing of the routing of the calls will fail.

Step 3: Do an Apply-Extension-Change or an Apply-Cluster-Change

For an apply-extension-change
DONE Navigate to:  SMP > Directory

Search for the extension and click on the 'Apply Extension Change' icon.

Alternatively, an apply-cluster-change can be done
DONE Navigate to:  SMP > Apply Changes > Apply Cluster Changes

Step 4: Reload the configuration of the phone

DONE Navigate to:  SMP > Advanced > Phone Status

Search for your phone and click 'reboot'

Note that the Eyebeam has a specific provisioning mechanism.

The general administrator guide for the Eyebeam Softphone is available here.

Active-Active configuration is supported on the Eyebeam. In order to work you need to make sure that the extension in the internal directory is set on 2 SOPs, same thing goes for the resource configuration, the resource (SDX2aaaa) should be configured on SOP1 and SOP2.

Two identities will have to be created on the Eyebeam phone. Both of them will use the resource name as Username and Authorization user name.

Identity 1:
  • User name: resource name (SDX2aaaa)
  • Authorization user name: resource name (SDX2aaaa)
  • Domain: IP address of SOP 1.
  • Dialplan: #1\a\a.T|[0-9a-zA-Z+*#].T;match=1;prestrip=2;
  • Reregister every: 600

Identity 2:
  • User name: resource name (SDX2aaaa)
  • Authorization user name: resource name (SDX2aaaa)
  • Domain: IP address of SOP 2.
  • Dialplan: #2\a\a.T|[0-9a-zA-Z+*#].T;match=1;prestrip=2;
  • Reregister every: 600

If you use a batch script to download the configuration file in TFTP and generate the local Eyebeam configuration file, you will need to use the required version of the Eyebeam phone resource and adapt your batch script.

How to configure the net.Console in Active-Active?

Please refers to the net.Console administration guide for this topic

How to configure Cisco ATA devices in Active-Active?

DNS Setup

In order to enable Active-Active for the Cisco ATA devices, you need to set a SRV record in your DNS server.

The SRV record is basically a name that points towards one or more A record. You can weight the A zone to set priorities while resolving the DNS names.

So, it is a way to define a primary server and servers that will be there as backups in case the primary server doesn't answer.

In order to be able to configure Active-Active for the Cisco ATA devices in an Escaux environment, you will need to set one SRV record and two A records.

Example :

  • SRV record
             _sip._udp.SDS40002.sop35.escaux.com        SRV 0  1 5060   37.dev.escaux.com.
                                                        SRV 10 1 5060   34.dev.escaux.com.

ALERT! The prefix '_sip._udp.' is mandatory

  • A records
             37.dev.escaux.com  A               172.16.35.137
             34.dev.escaux.com  A               172.16.35.96

root@00000004:/tftpboot# nslookup 
> set type=srv
>  _sip._udp.SDS40002.sop35.escaux.com    
Server:      172.16.35.252
Address:   172.16.35.252#53

Non-authoritative answer:
_sip._udp.SDS40002.sop35.escaux.com   service = 10 1 5060 34.soplan.escaux.com.
_sip._udp.SDS40002.sop35.escaux.com   service = 0 1 5060 37.soplan.escaux.com.

Authoritative answers can be found from:
escaux.com   nameserver = ns2.mydyndns.org.
escaux.com   nameserver = ns4.mydyndns.org.
escaux.com   nameserver = ns1.escaux.com.
escaux.com   nameserver = ns3.mydyndns.org.
34.soplan.escaux.com   internet address = 172.16.36.26
37.soplan.escaux.com   internet address = 172.16.36.22
ns1.escaux.com   internet address = 213.246.219.72
ns2.mydyndns.org   internet address = 208.76.60.2
ns3.mydyndns.org   internet address = 208.76.61.2
ns4.mydyndns.org   internet address = 208.76.58.2
ns4.mydyndns.org   has AAAA address 2600:2004::76
> 
> exit

root@00000004:/tftpboot# dig srv _sip._udp.SDS40002.sop35.escaux.com

; <<>> DiG 9.7.0-P1 <<>> srv _sip._udp.SDS40002.sop35.escaux.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20094
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 7

;; QUESTION SECTION:
;_sip._udp.SDS40002.sop35.escaux.com. IN   SRV

;; ANSWER SECTION:
_sip._udp.SDS40002.sop35.escaux.com. 3492 IN SRV 0 1 5060 37.soplan.escaux.com.
_sip._udp.SDS40002.sop35.escaux.com. 3492 IN SRV 10 1 5060 34.soplan.escaux.com.

;; AUTHORITY SECTION:
escaux.com.      1430   IN   NS   ns3.mydyndns.org.
escaux.com.      1430   IN   NS   ns4.mydyndns.org.
escaux.com.      1430   IN   NS   ns1.escaux.com.
escaux.com.      1430   IN   NS   ns2.mydyndns.org.

;; ADDITIONAL SECTION:
34.soplan.escaux.com.   3492   IN   A   172.16.36.26
37.soplan.escaux.com.   3492   IN   A   172.16.36.22
ns1.escaux.com.      1175   IN   A   213.246.219.72
ns2.mydyndns.org.   686   IN   A   208.76.60.2
ns3.mydyndns.org.   5362   IN   A   208.76.61.2
ns4.mydyndns.org.   5362   IN   A   208.76.58.2
ns4.mydyndns.org.   5362   IN   AAAA   2600:2004::76

;; Query time: 10 msec
;; SERVER: 172.16.35.252#53(172.16.35.252)
;; WHEN: Fri Feb 28 15:18:22 2014
;; MSG SIZE  rcvd: 341

root@00000004:/tftpboot# 


SMP Setup

DONE Navigate to:  SMP > Resources > IP Phones > phone resource

Set the primary SOP to 37.sop.

DONE Navigate to:  General Resource Parameters > SOP 1

Set the secondary SOP to 34.sop.

DONE Navigate to:  General Resource Parameters > SOP 2

Set the SIP Server to the SRV DNS name you previously configured.

In our example, that would be : SDS40002.sop35.escaux.com

DONE Navigate to:  Line Configuration > SIP Server

Enable the DNS SRV Lookup by the Cisco ATA device by setting the 2 following option to 'Yes'.

DONE Navigate to:  Misc. Settings > DNS Settings > Use SRV DNS

DONE Navigate to:  Misc. Settings > DNS Settings > DNS SRV Auto Prefix

How to add an extra PSTN Gateway?

The call coming from the PSTN on a Gateway SOP, will be mapped locally on a extension of the cluster and then routed to the primary or secondary SOP via the intra-cluster routing. There aren't any specific configuration to be foreseen.

A call made from an IP phone to the PSTN will be routed to the Gateway via the routes configured in the extra-cluster routing. This extra-cluster routing is specific to each setup. The following step provides an overview of what needs to be done.

We will take here the example where all the routes defined in the extra-cluster routing are sent to the DefaultOut which then route the call via a Goto.Interface outgoing action.

Step 1: Configure the extra-cluster routing on each application SOPs

For each SOP which do not have connection to the PSTN, do the following:
DONE Navigate to:  SMP > Select the SOP > Communication Routing > Extra-Cluster Routing

Search for the DefaultOut route group and click on 'Goto.Interface' and put the following settings:
  • Default Outgoing Interface: Put here the MeshSIPTrunk of the primary PSTN gateway
  • Fallback Interface 1: Put here the MeshSIPTrunk of the primary PSTN gateway
  • Caller ID Policy: Transparent
  • Options: Network ring tone

In this way if the primary gateway fails, the call will be routed to the secondary gateway. The caller id policy and the Options will be controlled by the gateways depending of the integration needs.

Step 2: Configure the extra-cluster routing on each gateway SOPs

Search for the DefaultOut route group, click on 'Goto.Interface' and put the following settings:
  • Default Outgoing Interface: Put the first trunk to access the PSTN (e.g. first PRI trunk)
  • Fallback Interface 1: Put the first fallback trunk to access the PSTN interface (e.g. second PRI trunk)
  • Caller ID Policy: For example: 'translate' or 'as set in global parameter'. The actual value will depend of the operator
  • Options: Network ring tone (with some operator, 'PBX ring tone' is used in order to prevent some integration issues)

ALERT! Important remarks:
  • On a gateway SOP, unless also used as an applicative SOP, do not put another gateway in the list of fallback interface. This can lead to a loop between the two gateways. Instead, the application SOP can have multiple gateways. If the request from app_sop -> gateway1 fails, app_sop will fallback to gateway2. gateway1 must not reroute calls to gateway2.

How to configure a queue in Active-Active?

Dependencies

  • Cluster & Active-Active Support version 1.4 or higher.
  • Queue resource version 2.2 or higher.
  • Asterisk-1.2x module version 2.33.1 or higher.
  • Watchdog module version 1.1 or higher.

Queue setup

In the definition of the queue, select the two SOPs the queue will be registered in, and set the option Active-active queue synchronization to yes.

Cluster & Active-Active Support module

In the configuration of the module, select a value for the Queue synchronization interval.

Phones configuration

In order for the Active-Active queue synchronization to work properly, the primary SOP for the queue and the primary SOP for a phone must be the same.

Limitations

For now, members' status are the only information synchronized. A member can be LOGGEDIN, PAUSED or LOGGEDOUT.

Expected behavior

The configuration used is the following:
  • "Qualify SIP devices registration" in Asterisk module set to yes
  • "Qualify SIP devices registration" in resource set to Default

Remarks:
  • Note that if you want to be able to login/logout using PUM while the primary SOP is down, you have to set up a redundant DHCP server that will tell the phone to use another SOP as TFTP server. See AppNotePersonalUserMobility for details.

Polycom phones

Configuration Expected downtime for outgoing calls Expected downtime for incoming calls
The extension is directly linked to the physical resource less than 25 seconds less than 25 seconds
The extension is linked to a virtual phone less than 25 seconds less than 35 seconds

ALERT! Important remarks:
  • Note that while the primary SOP is down, and during less than 5 minutes, the outgoing calls takes an additional 5 seconds before being established.
  • Note that once the primary SOP goes up, the phone will be unreachable (but is able to make outgoing phone calls) until it register again on the primary SOP. This can take approximately 15 seconds.

Aastra phones

Configuration Expected downtime for outgoing calls Expected downtime for incoming calls
The extension is directly linked to the physical resource less than 35 seconds less than 75 seconds
The extension is linked to a virtual phone less than 35 seconds less than 130 seconds

ALERT! Important remarks:
  • Note that the "Registration refresh period (seconds)" was set to 60 in the resource to minimize the downtime.

Snom Phones

Expected downtime for outgoing calls Expected downtime for incoming calls
less than 90 seconds less than 20 seconds

ALERT! Important remarks:
  • Note that once the primary SOP goes up, the phone will be unreachable (but is able to make outgoing phone calls) until it register again on the primary SOP. This can take approximately 230 seconds.

Frequently asked questions

What kind traffic is exchanged between the SOP and what are their bandwidth consumption?

The following information are exchanged between the SOPs in order to keep them synchronized:
  • Phone statuses: The phone statuses are broadcasted by the SOP of the phone to all the other SOPs in the cluster. This enables desktop applications such as net.Console and net.Desktop to display a cluster wide phone statuses. The bandwidth consumed by this traffic is around 1 kbyte per call. At startup of the SOP it will also re-synchronize its cache with the rest of the cluster.
  • Prompt synchronization: When a new prompt is upload on the master prompt repository, it will be replicated on the other SOPs of the cluster
  • Queue status: If a queue is configured in Active-Active with the synchronization enabled, the status will be pushed to the secondary or the primary SOP. The bandwidth used for the synchronization is of 300 bytes per queue member status change. Since this concerns only the login, logout, pause and unpause, this traffic can be ignored in term of bandwidth consumption. At startup of the SOP the queues are re-synchronized against the statuses in the other SOPs of the cluster.
  • Extension status and profile parameters: If dynamic profiles are used, each time a user changes its status or some profile's parameter such as a forward number, this information will be broadcasted to the rest of the cluster. This enables the secondary SOP to process the calls according to the right status and the right profile parameters. The statuses are also used by some desktop applications such as the net.Console or the net.Desktop . Each parameter or status change causes the exchange of 300 bytes of data. Since the statuses and the profile parameters are not often changed, this traffic can be ignored in term of bandwidth consumption. At startup of the SOP, the status and profiles parameters will be re-synced with the rest of the cluster in order to refresh the local cache.

Note that a call is either handled on the primary or on the secondary SOP of an extension. No synchronization mechanism are foreseen in order to call contexts. There are no specific SIP or RTP traffic generated by the Active-Active setup. The usual dimensioning rules applies.

Can I use load balancing on a SIP trunk with my SIP provider?

This is not supported.

Let's suppose that you have two SOPs, your SIP provider does some load balancing on a SIP trunk on both SOPs and your phone is registered on both SOPs. Then if the two SOPs cannot see each other but the phone can still reach them, call transfer won't be possible any more. Indeed if a call is received on the first line via the secondary SOP, the call on the second line initiated in order to do a call transfer will be sent to the primary SOP. When the user will confirm the call transfer, the SOP won't be able to bridge the two calls which are located on two different SOPs.

DHCP redundancy

When you don't have any special DHCP setup the behavior is as follows:
  • DHCP should be enabled on Fusion 1, disabled on Fusion 2
  • When connection to Fusion 1 is lost, DHCP should be enabled on Fusion 2
  • When connection to Fusion 1 is re-established, DHCP should be disabled on Fusion 2, enabled on Fusion 1 to make sure phones register on Fusion 1

Other references

Copyright © Escaux SA