Administrator Guide: SMP Redundancy

This document explains how to activate or deactivate an SMP (i.e. put it as active or standby).

Table of contents

What it does

The SMP redundancy mechanism takes care of:

  • setting up/stopping the MySQL replication,
  • starting/stopping the scheduler for the tasks,
  • starting/stopping SSHMON .

Requirements

  • SMP Admin Module 1.11.0 or higher
  • 2 SMP "sites"
  • An alias for which the IP should change.

Status of an SMP

Initial state

The default status of an SMP is "active". The current status is stored in the file /escaux/etc/smp_status.ini.

During the installation,
  • if the file does not exist, it will be created and the status set to "active".
  • if the file exists but contains no value, we will set it to "active",
  • if the file exists and there is a value, we won't change it.

This means that when installed, the secondary site needs to be set to "standby" to correctly prepare it for switchover:
sop> deactivate_smp

Switchover procedures

Activate the secondary site

In case of issues with the primary site, you will have to activate the secondary site. This is easily done but requires a few steps

Verify replication status on the secondary site

Before activating the secondary site, it is best to verify it is correctly replicating data from the primary site. Use the repl_status.pl command for this. This needs to be verified on these SMP types: SMP web, engine and reporting.

root@smp:~# repl_status.pl
'Connect_Retry' => '60'
'Exec_Master_Log_Pos' => '917895'
'Last_Errno' => '0'
'Last_Error' => ''
'Last_IO_Errno' => '0'
'Last_IO_Error' => ''
'Last_SQL_Errno' => '0'
'Last_SQL_Error' => ''
'Master_Host' => '192.168.2.11'
'Master_Log_File' => 'mysql-bin.000356'
'Master_Port' => '3306'
'Master_SSL_Allowed' => 'No'
'Master_SSL_CA_File' => ''
'Master_SSL_CA_Path' => ''
'Master_SSL_Cert' => ''
'Master_SSL_Cipher' => ''
'Master_SSL_Key' => ''
'Master_SSL_Verify_Server_Cert' => 'No'
'Master_Server_Id' => '1217'
'Master_User' => 'repl'
'Read_Master_Log_Pos' => '917895'
'Relay_Log_File' => 'mysql-relay-bin.000789'
'Relay_Log_Pos' => '918041'
'Relay_Log_Space' => '918240'
'Relay_Master_Log_File' => 'mysql-bin.000356'
'Replicate_Do_DB' => ''
'Replicate_Do_Table' => ''
'Replicate_Ignore_DB' => 'sop,cdrdb,mysql'
'Replicate_Ignore_Server_Ids' => ''
'Replicate_Ignore_Table' => ''
'Replicate_Wild_Do_Table' => 'tmp_smp_member.%,smp.%,cdrdb\_________.%,sop\_________.%'
'Replicate_Wild_Ignore_Table' => 'sop\_________.temp_directory_entry'
'Seconds_Behind_Master' => '0'
'Skip_Counter' => '0'
'Slave_IO_Running' => 'Yes'
'Slave_IO_State' => 'Waiting for master to send event'
'Slave_SQL_Running' => 'Yes'
'Until_Condition' => 'None'
'Until_Log_File' => ''
'Until_Log_Pos' => '0'

Pay special attention to these parameters:
'Slave_IO_Running' => 'Yes'
'Slave_IO_State' => 'Waiting for master to send event'
'Slave_SQL_Running' => 'Yes'

If replication is not running properly, it is recommended to restart it before activating the SMP by executing in order the CLI stop_mysql_replication and CLI start_mysql_replication commands.

Activate services on the secondary site

Set the SMP status to active on the secondary site. This needs to be done on the SMP web, engine and reporting machine types.

root@smp:~# set_smp_status.pl active
2014/01/17 15:14:57 16947 [ INFO] Set SMP status to active
2014/01/17 15:14:57 16947 [ INFO] Running scripts
2014/01/17 15:14:57 16947 [DEBUG] running script '/escaux/etc/smp_redundancy.d/A10reset'
2014/01/17 15:14:57 16947 [ INFO] Exiting...

Notes:
  • This command will stop replication from the primary site to the secondary site.
  • It is not required to first put the primary site in standby. However, any changes you make to the primary site afterwards will be lost.

Accept one public key of the sops

Pick out any of the SOP's and accept the public key:

DONE Navigate to:  Server Configuration > Accept key

Set the primary SMP to standby mode

Now it is time to set the primary SMP to standby mode to make sure it is ready to rollback in case of issues with the secondary site. This needs to be done on the SMP web, engine and reporting SMP types.

root@smp:~# set_smp_status.pl standby
2014/01/17 16:32:30 24583 [ INFO] Set SMP status to standby
2014/01/17 16:32:30 24583 [ INFO] Running scripts
2014/01/17 16:32:30 24583 [DEBUG] running script '/escaux/etc/smp_redundancy.d/S20setup'

Do you want to continue? Your local data will be erased and replaced by the content of the remote database
Type 'Yes' to continue, anything else to skip setting up replication.
Yes
Stopping local replication slave.
Stopping the local database server.
mysql stop/waiting
Cloning databases from 'w.x.y.z'. This can take a while depending on the size of the databases.
Starting the local database server.
mysql start/running, process 27866
Setting up and starting replication from 'w.x.y.z'.
2014/01/17 16:32:36 24583 [DEBUG] running script '/escaux/etc/smp_redundancy.d/S20stop_schedule'
2014/01/17 16:32:36 24583 [ INFO] Exiting...

In case you accidentally skip setting up replication, you can still manually start it:
root@smp:~# repl_setup.pl

Do you want to continue? Your local data will be erased and replaced by the content of the remote database
Type 'Yes' to continue, anything else to skip setting up replication.
Yes
Stopping local replication slave.
Stopping the local database server.
mysql stop/waiting
Cloning databases from 'w.x.y.z'. This can take a while depending on the size of the databases.
Starting the local database server.
mysql start/running, process 27866
Setting up and starting replication from 'w.x.y.z'.

Notes:
  • This operation locks both SMP's during the time of synchronization.
  • This can take several minutes, depending on the amount of data and the transfer speeds.

Rolling back to the primary site

Rolling back to the primary site is done in the exact same way as activating the secondary site. Again, here are the steps summarized:

  • Verify replication from the secondary to the primary site (Command repl_status.pl on the web, engine and reporting machines of the primary site)
  • Activate the primary site (Command set_smp_status.pl active on the web, engine and reporting machines of the primary site)
  • Accept a single SSH public key on the primary site.

Subsystems

MySQL replication

Setup up and start the replication

The replication mechanism is started when you set the SMP to standby. You can also start it manually with the script repl_setup.pl This will revoke the privileges for the 'root' user.

Stop completely the mechanism.

The replication mechanism is stopped when you set the SMP to active. You can also stop it manually with the script repl_reset.pl. This will also restore the privileges for the 'root' user.

Check the status

You can check the status of the host with repl_status.pl

Alias DNS record

Point DNS record of the alias to the active site IP.

What if you need to replicate to much data

At the moment MySQL Replication is started the MySQL DB of the primary server will be locked. In case you need to migrate a lot of data it can be handy to first sync the data before starting the replication.

rsync -a -e 'ssh -F /etc/ssh/keys/www-data-config -q' $host:/var/lib/mysql/smp /var/lib/mysql/ >/dev/null 2>&1
rsync -a -e 'ssh -F /etc/ssh/keys/www-data-config -q' $host:/var/lib/mysql/sop_* /var/lib/mysql/ >/dev/null 2>&1
rsync -a -e 'ssh -F /etc/ssh/keys/www-data-config -q' $host:/var/lib/mysql/cdrdb_* /var/lib/mysql/ >/dev/null 2>&1

By using the commands above you will not lock the MySQL table so the customer will not be impacted. Be aware that the first execution will take some time. The 2nd time you run the commands it will only take a few moments.

After running multiple times these command you can start the MySQL Replication via the normal procedure explained above. This will again use and rsync but because you already copied most of the data, this will only take a minute, meaning that the customer has almost no impact.

Restart replication on standby

Sometimes it happens the standby machine gives and error and is no longer replicating( = our support SMP after ICT maintenance) To re-setup the replication between Primary and standby we do the following:
  • Logon the secundary smp
  • Start a screen
  • run repl_reset.pl
  • next run repl_setup.pl

What is not taken into account

DNS settings

We do not set or change DNS settings with the SMP Redundancy. This is left to the administrator.

Copyright © Escaux SA