In the last post of this series, we learned how to install/configure RMQ for vCloud Director. This post is an extension of my last post where I will be adding one more node to my RMQ setup to form a cluster for high availability.
What data is replicated in an RMQ Cluster?
All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. An exception to this is message queues, which by default reside on one node, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, see the documentation on high-availability
Note: Before proceeding with cluster formation, please ensure the following:
1: Use the same version of Erlang and RMQ server rpm which is installed on the master node.
2: RMQ nodes address each other using domain names, either short or FQDNs. Therefore hostnames of all cluster members must be resolvable from all cluster nodes, as well as machines on which command line tools such as rabbitmqctl might be used.
Here are the high-level steps for cluster formation:
- Have a single node running (rmqsrv01).
- Stop another node (rmqsrv02).
- Reset the stopped node (rabbit-2rmqsrv02).
- Cluster the other node to the root node.
- Start the stopped node.
Step 1) Install the same version of Erlang and rabbitmq-server rpm that is installed on the master node. The steps of doing so have been documented in my previous post
Step 2) Copy the Erlang cookie from the master node to the second node.
RabbitMQ nodes and CLI tools (e.g. rabbitmqctl) use a cookie to determine whether they are allowed to communicate with each other. For two nodes to be able to communicate they must have the same shared secret called the Erlang cookie. The cookie is just a string of alphanumeric characters. Every cluster node must have the same cookie.
On Linux systems, the cookie will be typically located in /var/lib/rabbitmq/.erlang.cookie or $HOME/.erlang.cookie.
1 2 3 |
[root@rmqsrv01 ~]# cat /var/lib/rabbitmq/.erlang.cookie PKATXPIFMSTGSLHMVFPU |
Copy the Erlang cookie file to directory /var/lib/rabbitmq on the 2nd node.
Additionally, you can verify the md5sum of the cookie:
1 2 3 4 5 6 7 |
[root@rmqsrv01 ~]# md5sum /var/lib/rabbitmq/.erlang.cookie df97d981d5733d95f0c3191969bc234a /var/lib/rabbitmq/.erlang.cookie [root@rmqsrv02 ~]# md5sum /var/lib/rabbitmq/.erlang.cookie df97d981d5733d95f0c3191969bc234a /var/lib/rabbitmq/.erlang.cookie |
Step 3) Stop the RMQ app and reset the node.
1 2 3 4 5 |
[root@rmqsrv02 ~]# rabbitmqctl stop_app Stopping rabbit application on node rabbit@rmqsrv02 ... [root@rmqsrv02 ~]# rabbitmqctl reset Resetting node rabbit@rmqsrv02 ... |
At this point, if you check the cluster status, you will see only one node as a disc node and also running_nodes will list only one node i.e. node where you are checking the status.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[root@rmqsrv01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@rmqsrv01 ... [{nodes,[{disc,[rabbit@rmqsrv01]}]}, {running_nodes,[rabbit@rmqsrv01]}, {cluster_name,<<"rabbit@rmqsrv01.alex.local">>}, {partitions,[]}, {alarms,[{rabbit@rmqsrv01,[]}]}] [root@rmqsrv02 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@rmqsrv02 ... [{nodes,[{disc,[rabbit@rmqsrv02]}]}, {running_nodes,[rabbit@rmqsrv02]}, {cluster_name,<<"rabbit@rmqsrv02.alex.local">>}, {partitions,[]}, {alarms,[{rabbit@rmqsrv02,[]}]}] |
Step 4) Add the second node to the cluster
1 2 3 |
[root@rmqsrv02 ~]# rabbitmqctl join_cluster --ram rabbit@rmqsrv01 Clustering node rabbit@rmqsrv02 with rabbit@rmqsrv01 ... |
Now if you check the cluster status, you will see both nodes are listed as disc nodes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[root@rmqsrv01 ~]# rabbitmqctl cluster_status Cluster status of node rabbit@rmqsrv01 ... [{nodes,[{disc,[rabbit@rmqsrv01]},{ram,[rabbit@rmqsrv02]}]}, {running_nodes,[rabbit@rmqsrv02,rabbit@rmqsrv01]}, {cluster_name,<<"rabbit@rmqsrv01.alex.local">>}, {partitions,[]}, {alarms,[{rabbit@rmqsrv02,[nodedown]},{rabbit@rmqsrv01,[]}]}] [root@rmqsrv02 rabbitmq]# rabbitmqctl cluster_status Cluster status of node rabbit@rmqsrv02 ... [{nodes,[{disc,[rabbit@rmqsrv01]},{ram,[rabbit@rmqsrv02]}]}, {running_nodes,[rabbit@rmqsrv01,rabbit@rmqsrv02]}, {cluster_name,<<"rabbit@rmqsrv01.alex.local">>}, {partitions,[]}, {alarms,[{rabbit@rmqsrv01,[]},{rabbit@rmqsrv02,[]}]}] |
Note: By default, the cluster stores messages on the disk. You can also choose to store Queues in Memory. You can have a node as a RAM node while attaching it to the cluster:
1 2 3 |
# rabbitmqctl stop_app # rabbitmqctl join_cluster --ram rabbit@rmqsrv02 |
Node type can be changed by supplying a switch ‘change_cluster_node_type’ and then selecting either disc or ram as the type.
1 |
# rabbitmqctl change_cluster_node_type disc | ram |
It is recommended to have at least one disk node in the cluster so that messages are stored on a persistent disk and can avoid any loss of messages in case of a disaster
Set the HA Policy
The following command will sync all the queues across all nodes:
1 2 3 |
[root@rmqsrv01 ~]# rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}' Setting policy "ha-all" for pattern [] to "{"ha-mode":"all","ha-sync-mode":"automatic"}" with priority "0" ... |
I hope this post is informational to you. Feel free to share this on social media if it is worth sharing. Be sociable 🙂