OpenStack Release Upgrade¶
Instructions for upgrading an OpenStack cloud deployed with Juju.
These are the steps for an upgrade from Mitaka to Newton OpenStack release, adapted from [1].
For upgrading to Ocata see the release notes.
Warning
For Ocata use openstack-origin=cloud:xenial-ocata
For upgradig to Pike see the release notes.
Warning
For Pike use openstack-origin=cloud:xenial-pike
For upgradig to Queens see the release notes.
Warning
For Queens use openstack-origin=cloud:xenial-queens
For upgrading OpenStack Juju charms see [2].
In order to align with the suggested naming according to the OpenStack documentation, as a preliminary step we rename the “Service Project” to service:
$ juju config keystone service-tenant=services
Check the status of the services, in particular Keystone:
$ juju status keystone
Model Controller Cloud/Region Version
cloudbase garrmaas garr/Bari 2.1.2
App Version Status Scale Charm Store Rev OS Notes
defaultgw-ba1-cl2 active 3 defaultgw jujucharms 6 ubuntu
keystone-ba1-cl2 9.3.0 active 3 keystone local 3 ubuntu
keystone-hacluster-ba1-cl2 active 3 hacluster jujucharms 33 ubuntu
nrpe-keystone-ba1-cl2 unknown 3 nrpe jujucharms 21 ubuntu
Charm upgrades¶
Ensure the nrpe-xxx charm use at least version nrpe-30.
For Ocata we are using the 17.08 release of the OpenStack Charms.
$ juju upgrade-charm nova-cloud-controller
$ juju upgrade-charm openstack-dashboard
$ juju upgrade-charm ceph-radosgw
$ juju upgrade-charm percona-cluster
$ juju upgrade-charm neutron-api-hacluster
$ juju upgrade-charm neutron-gateway
$ juju upgrade-charm neutron-ovs
$ juju upgrade-charm nova-cloud-controller
$ juju upgrade-charm nova-compute
$ juju upgrade-charm ceilometer
$ juju upgrade-charm ceilometer-agent
$ juju upgrade-charm ceph-proxy
$ juju upgrade-charm cinder
$ juju upgrade-charm cinder-ceph
$ juju upgrade-charm glance
$ juju upgrade-charm gnocchi
$ juju upgrade-charm memcached
$ juju upgrade-charm nagios
$ juju upgrade-charm neutron-api
$ juju upgrade-charm neutron-gateway
$ juju upgrade-charm ntp
$ juju upgrade-charm postgresql
$ juju upgrade-charm rabbitmq-server
Clean the database¶
Remove expired tokens from the Keystone database keystone. Find the leader:
$ juju run --application keystone is-leader
Assuming the leader unit is L:
$ juju ssh keystone/$L sudo keystone-manage token_flush
Upgrading the OpenStack Services¶
In a rolling upgrade of an OpenStack service, each unit within a service is upgraded one at a time, thus rolling the update across the service.
This is procedure to perform the upgrade on each service:
configure the charm of the service for managed upgrade
pause the services on the leader unit in a cluster
perform the upgrade
resume the services on the leader unit
Step 1 exploits the openstack-origin configuration option, that is used to specify the repository from which to download the upgraded packages for a service. Changing the value of the openstack-origin configuration option is done using the juju config command.
Keystone¶
In order to speed up the upgrade, temorarly disable saml2:
$ juju config keystone enable-saml2=false
Find the leader:
$ juju run --application keystone is-leader
Assuming that the leader is L, stop the identity service:
$ juju run-action keystone/$L --wait pause
Configure for the upgrade:
$ juju config keystone action-managed-upgrade=true
Set the origin for the upgrade:
$ juju config keystone openstack-origin=cloud:xenial-queens
Launch the upgrade:
$ juju run-action keystone/L --wait openstack-upgrade
Resume the service:
$ juju run-action keystone/$L resume
Repeat the proess for the other units of the service:
$ for i in {0..2}; do
juju run-action keystone/$i --wait pause;
juju run-action keystone/$i --wait openstack-upgrade;
juju run-action keystone/$i resume;
done
Warning
If you get the following error in apache2.log of the leader unit:
InternalError (1054, "Unknown column 'user.created_at' in 'field list'")
you need to upgrade manually the keystone database
Log into the leader unit of the service:
$ juju run-action keystone/$L --wait pause
$ juju ssh keystone/$L sudo -u keystone keystone-manage --config-file /etc/keystone/keystone.conf db_sync --expand
$ juju ssh keystone/$L sudo -u keystone keystone-manage --config-file /etc/keystone/keystone.conf db_sync --migrate
$ juju ssh keystone/$L sudo -u keystone keystone-manage --config-file /etc/keystone/keystone.conf db_sync --contract
$ juju run-action keystone/$L resume
Check the status:
$ juju status keystone
Model Controller Cloud/Region Version
cloudbase garrmaas garr/Bari 2.1.2
App Version Status Scale Charm Store Rev OS Notes
defaultgw-ba1-cl2 active 3 defaultgw jujucharms 6 ubuntu
keystone-ba1-cl2 10.0.1 active 3 keystone local 3 ubuntu
keystone-hacluster-ba1-cl2 active 3 hacluster jujucharms 33 ubuntu
nrpe-keystone-ba1-cl2 unknown 3 nrpe jujucharms 21 ubuntu
Enable saml2:
$ juju config keystone enable-saml2=true
Workaround for incomplete relations¶
With Juju 2.2 a problem occurs when creating new relations after the upgrade. For the moment, Canonical suggests the following workaround.
Figure out the relation between percona and keystone:
$ juju run --unit keystone-ba1-cl2/$L "relation-ids shared-db"
shared-db:33
Set allowed_units on the relation:
$ juju run --unit keystone/$L "relation-set -r shared-db:33 allowed_units='keystone/0 keystone/1 keystone/2'"
Check the results:
$ juju run --unit keystone/$L "relation-get -r shared-db:33 - keystone/0"
allowed_units: keystone/$L keystone/1 keystone/2
database: keystone
hostname: 10.4.4.153
private-address: 10.4.4.153
username: keystone
Reset incomplete relations:
$ juju remove-relation glance keystone
Reset keystone:
$ juju resolved --no-retry keystone/$L
Wait for it to come back and then:
$ juju add-relation glance keystone
All other relations should now be established correctly. Check it with:
$ juju status glance
RabbitMQ¶
If upgrading the charm fails, you may need to do this:
$ juju ssh rabbitmq-server/$L sudo /bin/mkdir -p /usr//local/lib/nagios/plugins
on all units $L.
Glance¶
$ juju config glance action-managed-upgrade=true
$ juju config glance openstack-origin=cloud:xenial-queens
Find the leader:
$ juju run --application glance is-leader
Assuming that the leader is L:
$ juju run-action glance/$L --wait pause
$ juju run-action glance/$L --wait openstack-upgrade
$ juju run-action glance/$L resume
Repeat on the other units of the service:
$ for i in {0..2}; do
juju run-action glance/$i --wait pause;
juju run-action glance/$i --wait openstack-upgrade;
juju run-action glance/$i resume;
done
Ceph¶
Upgrade charm ceph-proxy:
$ juju upgrade-charm ceph-proxy
Cinder¶
$ juju config cinder action-managed-upgrade=true
$ juju config cinder openstack-origin=cloud:xenial-queens
Find the leader, and the apply this to the leader:
$ juju run-action cinder/$L --wait pause
$ juju run-action cinder/$L --wait openstack-upgrade
$ juju run-action cinder/$L resume
Repeat on the other units of the service:
$ for i in {0..2}; do
juju run-action cinder/$i --wait pause;
juju run-action cinder/$i --wait openstack-upgrade;
juju run-action cinder/$i resume;
done
For Ocata and ‘Pike`, there is a bug still being investigated. As a temporary fix, on each unit apply to the file /usr/lib/python2.7/dist-packages/oslo_messaging/_utils.py this patch (Note: we didn’t observer this issue upgrading from Ocata to Pike):
*** _utils.py.orig 2017-10-05 15:39:26.728073723 +0000
--- _utils.py 2017-10-05 15:42:08.308323044 +0000
***************
*** 20,25 ****
--- 20,28 ----
:param imp_version: The version implemented
:param version: The version requested by an incoming message.
"""
+ # Attardi: workaround for error in cinder-scheduler: "Requested message version, 3.0 is incompatible."
+ return True
+
if imp_version is None:
return True
and then issue:
$ for i in {0..2}; do \
juju ssh cinder/$i sudo service cinder-scheduler restart;
done
Cinder Ceph¶
No OpenStack upgrade is currently available, but check by doing:
$ juju config cinder-ceph action-managed-upgrade=true
$ juju config cinder-ceph openstack-origin=cloud:xenial-queens
If present, apply the same procedure as above.
Rados GW¶
No OpenStack upgrade is currently available, but check by doing:
$ juju config rados-gw action-managed-upgrade=true
$ juju config rados-gw openstack-origin=cloud:xenial-queens
If present, apply the same procedure as above.
Nova Controller¶
When upgrading to xenial-ocata you need to do this first:
$ juju upgrade-charm nova-cloud-controller
Upgrade OpenStack:
$ juju config nova-cloud-controller action-managed-upgrade=true
$ juju config nova-cloud-controller openstack-origin=cloud:xenial-queens
On the leader unit L do:
$ juju run-action nova-cloud-controller/$L --wait pause
$ juju run-action nova-cloud-controller/$L --wait openstack-upgrade
Upgrade the database, on the leader unit L of the service (only Mitaka to Newton):
$ juju ssh nova-cloud-controller/$L sudo nova-manage db sync
$ juju ssh nova-cloud-controller/$L sudo nova-manage api_db sync;
$ juju ssh nova-cloud-controller/$L sudo nova-manage db online_data_migrations;
$ juju ssh nova-cloud-controller/$L sudo service nova-api-os-compute restart;
$ juju ssh nova-cloud-controller/$L sudo service nova-consoleauth restart;
$ juju ssh nova-cloud-controller/$L sudo service nova-scheduler restart;
$ juju ssh nova-cloud-controller/$L sudo service nova-conductor restart;
$ juju ssh nova-cloud-controller/$L sudo service nova-novncproxy restart;
$ juju run-action nova-cloud-controller/$L resume
Nova Compute¶
Ensure the presence of a relation between nova-compute and percona-cluster:
$ juju add-relation nova-compute percona-cluster
For upgrading to Ocata, also do this:
$ juju add-relation nova-compute cinder-ceph
Perform the upgrade:
$ juju config nova-compute action-managed-upgrade=true
$ juju config nova-compute openstack-origin=cloud:xenial-queens
On the leader unit L do:
$ juju run-action nova-compute/$L --wait pause
$ juju run-action nova-compute/$L --wait openstack-upgrade
$ juju run-action nova-compute/$L resume
Warning
If you get the following error:
juju run-action nova-compute-/$L --wait pause
action-id: <id action>
message: exit status 1
status: failed
you need to switch on false and then again on true the action-managed-upgrade configuration parameter
Complete the upgrade on the other units:
$ for i {0..2}; do
juju run-action nova-compute/$i --wait pause;
juju run-action nova-compute/$i --wait openstack-upgrade;
juju run-action nova-compute/$i resume;
done
N.B. Upgrading our production cluster one of our 25 compute nodes failed the upgrade with a “config-change error”.
Looking on the log of juju-unit-nova-compute we found that the error was related to an error with a virsh command:
DEBUG config-changed subprocess.CalledProcessError: Command '['virsh', '-c', 'qemu:///system', 'secret-list']' returned non-zero exit status 1
ERROR juju.worker.uniter.operation runhook.go:107 hook "config-changed" failed: exit status 1
and after some debugging we noted that the file /etc/libvirt/libvirtd.conf was empty!
So, we deduced that something went wrong during the installation of the new packages. The solution was to:
- uninstall and re-install libvirt-bin
- uninstall and reinstall nova-libvirt
- reboot the node
Ather the reboot all the instances on the hypervisor were shut down; re-starting them with nova was enough to make them running again.
Neutron API¶
$ juju config neutron-api action-managed-upgrade=true
$ juju config neutron-api openstack-origin=cloud:xenial-queens
On the leader unit L do:
$ juju run-action neutron-api/$L --wait pause;
$ juju run-action neutron-api/$L --wait openstack-upgrade;
$ juju run-action neutron-api/$L resume;
Complete the upgrade on the other units:
$ for i {0..2}; do
juju run-action neutron-api/$i --wait pause;
juju run-action neutron-api/$i --wait openstack-upgrade;
juju run-action neutron-api/$i resume;
done
Neutron Gateway¶
$ juju config neutron-gateway action-managed-upgrade=true
$ juju config neutron-gateway openstack-origin=cloud:xenial-queens
$ juju run-action neutron-gateway/$L openstack-upgrade
.. warning:: If you get the following error::
juju run-action neutron-gateway-/$L openstack-upgrade
action-id: <id action>
message: exit status 1
status: failed
you need to switch on false and then again on true the action-managed-upgrade configuration parameter
Note upgrading charm neutron-gateway. We enabled neutron HA on our two neutron-gateway servers. After upgrading the charm the routers were not reachable any longer. The reason was that the routers were in STANDBY state on both servers:
$ neutron l3-agent-list-hosting-router admin-router-test
+--------------------------------------+------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+------------+----------------+-------+----------+
| 775a4d38-d6b7-4f6d-a34b-e5327563e2e5 | pa1-r2-s09 | True | :-) | standby |
| 9f9a2fe0-561f-43e8-aa94-1f3224266f06 | pa1-r1-s11 | True | :-) | standby |
+--------------------------------------+------------+----------------+-------+----------+
We solved the issue disabling and then re-enabling again HA on the router:
$ neutron router-update admin-router-test --admin_state_up=false
$ neutron router-update admin-router-test --ha=false
$ neutron router-update admin-router-test --admin_state_up=true
$ neutron router-update admin-router-test --admin_state_up=false
$ neutron router-update admin-router-test --ha=true
$ neutron router-update admin-router-test --admin_state_up=true
OpenStack Dahsboard¶
When upgrading to xenial-ocata you need to do this first:
$ juju upgrade-charm openstack-dashboard
Upgrade openstack-dashboard:
$ juju config openstack-dashboard action-managed-upgrade=true
$ juju config openstack-dashboard openstack-origin=cloud:xenial-queens
$ for i {0..2}; do
juju run-action openstack-dashboard/$i --wait pause;
juju run-action openstack-dashboard/$i --wait openstack-upgrade;
juju run-action openstack-dashboard/$i resume;
done
Apply this patch, that causes an error in Apache:
https://bugs.launchpad.net/charm-openstack-dashboard/+bug/1678014
Ceilometer¶
$ juju config ceilometer-agent action-managed-upgrade=true
$ juju config ceilometer-agent openstack-origin=cloud:xenial-queens
$ juju run-action ceilometer-agent/$L openstack-upgrade
$ juju config ceilometer action-managed-upgrade=true
$ juju config ceilometer openstack-origin=cloud:xenial-queens
$ juju run-action ceilometer/$L openstack-upgrade
gnocchi (csd-garr charm)¶
Gnocchi charm has the option openstack-origin but it does not have action-managed-upgrade option. As a consequence it is not possible to upgrade the charm to Pike.
General troubleshooting¶
In the following we outline problems that may appear on any services during maintenance operations.
Message queue (Rabbit)¶
The connection between a service and the message queue may break during the upgrade process. It is advisable to log on the service units after the upgrade and check the logs for error messages. Problems with the message queue are reported as ” … ERROR oslo_messaging … connection timed out | connection refused | host unreachable”.
In general the solution is logging on the rabbit units and restart rabbitmq_server:
$ service rabbitmq_service restart
Afterwards check that the rabbitmq cluster is active:
$ rabbitmqctl cluster_status
Cluster status of node 'rabbit@juju-08eaf8-91-lxd-54' ...
[{nodes,[{disc,['rabbit@juju-08eaf8-156-lxd-9','rabbit@juju-08eaf8-90-lxd-63',
'rabbit@juju-08eaf8-91-lxd-54']}]},
{running_nodes,['rabbit@juju-08eaf8-156-lxd-9',
'rabbit@juju-08eaf8-90-lxd-63',
'rabbit@juju-08eaf8-91-lxd-54']},
{cluster_name,<<"rabbit@juju-08eaf8-128-lxd-48.maas">>},
{partitions,[]}]
In this case we see 3 running nodes, i.e. the cluster is complete.
We once ran into a more complicated issue: rabbit lost the rabbit_user correspondent to a service (glance)! To check users and permissions run the following commands, of which we show the output in case of good health:
$ rabbitmqctl list_users
Listing users ...
cinder []
glance []
guest [administrator]
nagios-rabbitmq-server-ct1-cl1-15 []
nagios-rabbitmq-server-ct1-cl1-16 []
nagios-rabbitmq-server-ct1-cl1-17 []
nagios-rabbitmq-server-ct1-cl1-18 []
nagios-rabbitmq-server-ct1-cl1-19 []
neutron []
nova []
$ rabbitmqctl list_user_permissions glance
Listing permissions for user "glance" ...
openstack .* .* .*
The solution in this case was to remove and add the relation between glance and rabbit, which reconfigured the services correctly.