Manually sync Percona database servers

It may happen sometimes, for example right after bundle deploy, that the database cluster fails to sync across units.

In such cases, one can manually force sync-ing by:

  • stopping all mysql instances

  • stopping juju agents on all units

  • starting one instance, and possibly force it to run as master

  • starting the additional instances

  • starting juju agents on all units

The detailed procedure is described below.

stop mysql on all units

$ juju run –application percona-cluster “sudo systemctl stop mysql”

make sure all mysql processes are indeed stopped, do not trust systemctl status mysql

$ juju run –application percona-cluster “ps -ef | grep mysql”

stop juju agents on all units::

$ juju ssh percona-cluster/N sudo service jujud-unit-percona-cluster-N stop

(repeat the command on the other services)

connect to one of the servers, and force mysql restart as master.

Note

You should perform the following operations on the last-acting master, otherwise mysql will fail to start. If you are unable to determine the last-acting master, edit file /var/lib/percona-xtradb-cluster/grastate.dat and set safe_to_bootstrap: 1 before restarting mysql.

  • Edit file /etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf, you should find the line:

    wsrep_cluster_address=gcomm://<ip_1>,<ip_2>

where <ip_1>, <ip_2> are the IP addresses of the other members of the cluster.

  • Comment this line, copy/paste then remove the IP addresses, leaving:

    wsrep_cluster_address=gcomm://
    
  • Finally restart MySql:

    $ sudo service mysql start
    
  • It may happen that mysql server still does not start because of failed transactions. In case try the following procedure.

Looking in the log file /var/log/mysql/error.log you may find messages like:

2023-01-30T23:57:22.828179Z 0 [ERROR] Found 7 prepared transactions! It means that mysqld was not...binlog or tc.log file) was manually deleted after a crash.
You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
  • Issue the following command:

    $ sudo mysqld --tc-heuristic-recover=ROLLBACK
    

This should fix the failed transactions.

start mysql on the remaining nodes, as usual

$ juju ssh percona-cluster-N sudo systemctl start mysql

go back to the first node, stop mysql, revert the change to wsrep_cluster_address and restart mysql

On the first node edit the file /etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf and uncomment the original line:

wsrep_cluster_address=gcomm://<ip_1>,<ip_2>

Then restart mysql:

$ sudo service mysql restart

start juju agents on all the units

$ juju ssh percona-cluster/N sudo service jujud-unit-percona-cluster-N start

(repeat the command on the other services)

Now your Percona cluster should be in sync.

Percona units in error hook failed: "leader-elected"

It may happen that one percona-cluster unit falls with this error during configuration. To fix it issue the following command:

$ juju resolved --no-retry percona-cluster/N

According to the Canonical support, this bug should be definetely fixed upgrading percona-cluster charm to v268:

$ juju upgrade-charm percona-cluster

Percona-cluster error: Leader UUID != Unit UUID

It may happen that two or more percona-cluster unit (non-leaders units) fail with this error.

If you check debug logs (juju debug-logs –include <percona-unit>) of the units in error, you should see similar logs:

020-07-09 15:04:07 DEBUG leader-settings-changed File "/var/lib/juju/agents/unit-percona-cluster-2/charm/hooks/percona_utils.py", line 662, in update_bootstrap_uuid
2020-07-09 15:04:07 DEBUG leader-settings-changed cluster_state_uuid)
2020-07-09 15:04:07 DEBUG leader-settings-changed percona_utils.InconsistentUUIDError: Leader UUID ('eb31ead5-c1f1-11ea-98b8-be3a8df2897a') != Unit UUID ('eb31ead5-c1f1-11ea-98b8-be3a8df2897b')

The problem is that the percona-cluster leader has got UUID eb31ead5-c1f1-11ea-98b8-be3a8df2897a while the other units are expecting this UUID to be eb31ead5-c1f1-11ea-98b8-be3a8df2897b.

To fix this issue, proceed as follows:

  1. Run the following command on the percona-cluster leader unit:

    juju run --unit <percona-cluster-leader> "leader-get bootstrap-uuid"
    
    **Note**: This should be eb31ead5-c1f1-11ea-98b8-be3a8df2897a
    
  2. Now, run the following command (again, on the percona-cluster leader unit):

    juju run --unit <percon-cluster-leader> "leader-set bootstrap-uuid=eb31ead5-c1f1-11ea-98b8-be3a8df2897b"