TIBCO EMS 8 Fault Tolerance + NFS4

Introduction

Starting with my “CentOS 6 + EMS” setup, I need to obtain a decent FT/HA demo setup for test purposes.

My clients are often trying to improve EMS availability in case of a disaster or simple software error. Not because EMS itself is unstable, but mostly to manage risk related to MOM RollOver(or FT), HA and DR. TIBCO does provides guidelines on the subject, but not precise recommendation on the combination of OS and distributed FS to use. Additionally, TIBCO experts usually push for file stores over DB stores. They state that DB stores should not be considered for performance reasons, even if it strikes me as the simplest possible FT setup.

Many clients are then encouraged to create a software or hardware filesharing system to allow multi-site FT. This article describe how to implement such a solution FOR TEST PURPOSES using a FREE SOFTWARE distributed filesystem (NFS4 on CentOS 6). The other popular option is to go with specialized HARDWARE SAN solutions. For budget consideration, I wonder if I could create a performing enough setup without going in that expensive direction.

According to the official EMS documentation, the requirements for a distributed EMS Fault tolerance back-end file system are:

  • Write Order
  • Synchronous Write Persistence
  • Distributed File Locking
  • Unique Write Ownership

Would CentosOS 6+ NFS4 complies to all these rules ? I honestly don’t know. TIBCO does not provide a precise list of free/software or hardware storage solutions that precisely follow these guidelines. They point out that NFS4 could in some instance comply if all of the above principles are respected.

DISCLAIMER : If TIBCO cannot certify a OS/FS pairing for EMS FT, I certainly can’t either. PLEASE CONSIDER THE FOLLOWING HOW-TO AS A SUGGESTION OF TESTING CONFIGURATION. PLEASE DO NOT APPLY THIS SOLUTION IN A PRODUCTION ENVIRONMENT WITHOUT PROPER TESTING.

Side-line editorial : I believe the future is full of easily distributed MOMs, and that this kind of difficult setup will disappear over time. Currently, some MOM (IBM MQ, Rabbit MQ)don’t make FT  that hard to implement (but do not support WAN connections). In contrast, some MOM FT setup are similar to EMS (Active MQ)

To summarize, I aim to create something like this:

EMS FT goal
Latency will be introduced for testing purposes

NFS4 setup

As written above, start by go through my CentOS/EMS how-to and my CentOS firewall how-to.

Then, I suggest following up an article like this one from cyberciti.biz to setup NF4 and this one from digitalocean.com.

On my VM image, here is what it looked like:

yum install nfs-utils nfs4-acl-tools portmap
yum  - all packages are there !
It seems my CentOS 6 VM already had all these

Ok, nothing to do then… moving along to the stores sharing.
Let’s edit /etc/exports and add this:

vim /etc/exports

#add this line, to allow all host in the VirtualBox subnet (assuming you address is in 10.0.2.*, like mine)
/opt/cfgtibco/tibco/cfgmgmt/ems 10.0.2.0/24(rw,sync,no_root_squash)

Turn on the relevant services:

chkconfig nfs on 
service rpcbind start
service nfs start

NFS service starts

Just to be on the safe side, I explicitly disable NFS2 and NFS3 like mentioned here. In file /etc/sysconfig/nfs, I added (uncommented):

MOUNTD_NFS_V2="no"
MOUNTD_NFS_V3="no"
RPCNFSDARGS="-N 2 -N 3"

…and restarted

Firewall and VirtualBox ports updates

Now, let’s open the pertinent ports. If you refer to my firewall how-to, you know I use the iptables-restore command. to load my config. Edit the /root/iptables-save.txt, and add a line like this one:

-A INPUT -p tcp -m tcp --dport 2049 -j ACCEPT

Then, we validate and save:

iptables-restore < /root/iptables-save.txt
iptables -L
/sbin/service iptables save

EMS primary instance setup

The “Primary” machine is almost ready… the EMS server is still working (just start it to validate), and it is using the file stores locally, without NFS. We must modify the configuration slightly to allow the “Secondary” machine to share the JSON configuration file.

Note : This is really the big revolution of the new EMS configuration file format (JSON), in the past, each machine in the cluster had an almost identical “tibemsd.conf” system of files (some could be shared, but usually not the main one)… This made no sense since the vast majority of the data was the same for every machine in the cluster. Now the configuration file is shared, and includes slight difference when using FT. (EMS user guide, “Configuring Fault Tolerance in Central Administration”, page 539)

I use the EMS CA interface to change the relevant configuration:

Server Props
Server properties – NOTE : The IPs are wrong, see the section below.
FT propos
Fault tolerance properties – NOTE : The IPs are wrong, see the section below.

EMS secondary instance setup

To create the EMS “Secondary” machine, I suggest (all steps detailed below):

  • Creating a new script for the second EMS instance
  • Cloning the “Primary” VM
  • Validate Virtual Box IP attribution.
    • Correct configuration if necessary.
  • Deleting the EMS data folder and NFS export configuration on “Secondary” VM (config and stores)
    • Create a nfs client at the same location.

EMS secondary script

Before cloning, I built this script (tibemsd64-2.sh in EMS “bin” folder), and exposed it as a shortcut on the desktop:

cd /opt/tibco/ems/8.1/bin
./tibemsd64 -config "/opt/cfgtibco/tibco/cfgmgmt/ems/data/tibemsd.json" -secondary

Here is the shortcut command:

gnome-terminal -e "gnome-terminal -e /opt/tibco/ems/8.1/bin/tibemsd64-2.sh"

Reference  : (EMS User Guide, “Starting Fault Tolerant Server Pairs”, page 109)

“Primary” VM cloning

Thanks VirtualBox !
Thanks VirtualBox !

“Secondary” VM IP validation

When I first started my VMs, I realized they had the SAME IP address (10.0.2.15)… this is where I realized that I needed to switch the networking mode in the VMs for “NAT” to “NAT network”.

Changed in VirtualBox main "preferences".
Changed in VirtualBox main “preferences”.
To be changed on each VMs
To be changed on each VMs

As simple as that, but now sadly the IPs have changed ! I end up with 10.0.2.4 and 10.0.2.5.

Another side-effect… The port-forwarding settings (tutorial here) for each separate VMs are lost, they have to be re-implemented on the group itself. As such:

New port forwarding for NAT network
New port forwarding for NAT network

I have to adjust the JSON file by hand, since the EMS server and EMSCA processes won’t start without proper listeners.

Here is my updated JSON file:

{
	"acls":	[],
	"bridges":	[],
	"channels":	[],
	"durables":	[],
	"emsca":	{
		"advanced":	[],
		"appliance_options":	{
			"store_paths":	[]
		},
		"emsca_listens":	[{
				"url":	"tcp://10.0.2.4:7222"
			}, {
				"url":	"tcp://10.0.2.5:7222"
			}]
	},
	"factories":	[{
			"jndinames":	[],
			"name":	"ConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"generic",
			"url":	"tcp://7222"
		}, {
			"jndinames":	[],
			"name":	"FTConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"generic",
			"url":	"tcp://localhost:7222,tcp://localhost:7224"
		}, {
			"jndinames":	[],
			"name":	"SSLConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[],
				"ssl_verify_host":	false
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"generic",
			"url":	"ssl://7243"
		}, {
			"jndinames":	[],
			"name":	"GenericConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"generic",
			"url":	"tcp://7222"
		}, {
			"jndinames":	[],
			"name":	"TopicConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"topic",
			"url":	"tcp://7222"
		}, {
			"jndinames":	[],
			"name":	"QueueConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"queue",
			"url":	"tcp://7222"
		}, {
			"jndinames":	[],
			"name":	"FTTopicConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"topic",
			"url":	"tcp://localhost:7222,tcp://localhost:7224"
		}, {
			"jndinames":	[],
			"name":	"FTQueueConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[]
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"queue",
			"url":	"tcp://localhost:7222,tcp://localhost:7224"
		}, {
			"jndinames":	[],
			"name":	"SSLQueueConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[],
				"ssl_verify_host":	false
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"queue",
			"url":	"ssl://7243"
		}, {
			"jndinames":	[],
			"name":	"SSLTopicConnectionFactory",
			"ssl":	{
				"ssl_issuer_list":	[],
				"ssl_trusted_list":	[],
				"ssl_verify_host":	false
			},
			"ssl_issuer_list":	[],
			"ssl_trusted_list":	[],
			"type":	"topic",
			"url":	"ssl://7243"
		}],
	"groups":	[{
			"description":	"Administrators",
			"members":	[{
					"name":	"admin"
				}],
			"name":	"$admin"
		}],
	"model_version":	"1.0",
	"queues":	[{
			"name":	">"
		}, {
			"name":	"sample"
		}, {
			"name":	"queue.sample"
		}],
	"routes":	[{
			"name":	"EMS-SERVER2",
			"selectors":	[],
			"url":	"tcp://7022"
		}],
	"stores":	[{
			"file":	"meta.db",
			"file_crc":	false,
			"mode":	"async",
			"name":	"$sys.meta",
			"type":	"file"
		}, {
			"file":	"async-msgs.db",
			"file_crc":	false,
			"mode":	"async",
			"name":	"$sys.nonfailsafe",
			"type":	"file"
		}, {
			"file":	"sync-msgs.db",
			"file_crc":	false,
			"mode":	"sync",
			"name":	"$sys.failsafe",
			"type":	"file"
		}],
	"tibemsd":	{
		"authorization":	false,
		"console_trace":	null,
		"detailed_statistics":	"NONE",
		"flow_control":	false,
		"ft_activation":	null,
		"ft_active":	null,
		"ft_heartbeat":	null,
		"ft_reconnect_timeout":	null,
		"ft_ssl":	{
			"ssl_ciphers":	null,
			"ssl_expected_hostname":	null,
			"ssl_identity":	null,
			"ssl_issuer_list":	[],
			"ssl_password":	null,
			"ssl_private_key":	null,
			"ssl_trusted_list":	[],
			"ssl_verify_host":	null,
			"ssl_verify_hostname":	null
		},
		"jre_options":	[],
		"log_trace":	null,
		"logfile":	"/opt/cfgtibco/tibco/cfgmgmt/ems/data/datastore/logfile",
		"logfile_max_size":	null,
		"max_connections":	0,
		"max_msg_memory":	"512MB",
		"max_stat_memory":	"64MB",
		"msg_swapping":	true,
		"multicast":	false,
		"password":	null,
		"primary_listens":	[{
				"ft_active":	true,
				"url":	"tcp://10.0.2.4:7222"
			}],
		"rate_interval":	3,
		"routing":	false,
		"secondary_listens":	[{
				"url":	"tcp://10.0.2.5:7222",
				"ft_active":	true
			}],
		"server":	"EMS-SERVER",
		"server_rate_interval":	1,
		"ssl":	{
			"ssl_cert_user_specname":	"CERTIFICATE_USER",
			"ssl_dh_size":	null,
			"ssl_issuer_list":	[],
			"ssl_password":	null,
			"ssl_rand_egd":	null,
			"ssl_require_client_cert":	null,
			"ssl_server_ciphers":	null,
			"ssl_server_identity":	null,
			"ssl_server_key":	null,
			"ssl_trusted_list":	[],
			"ssl_use_cert_username":	null
		},
		"statistics":	true,
		"statistics_cleanup_interval":	30,
		"store":	"/opt/cfgtibco/tibco/cfgmgmt/ems/data/datastore",
		"tibrv_transports":	null,
		"track_correlation_ids":	null,
		"track_message_ids":	null
	},
	"tibrvcm":	[],
	"topics":	[{
			"name":	">"
		}, {
			"exporttransport":	"RV",
			"name":	"topic.sample.exported"
		}, {
			"importtransport":	"RV",
			"name":	"topic.sample.imported"
		}, {
			"name":	"sample"
		}, {
			"name":	"topic.sample"
		}],
	"transports":	[{
			"daemon":	null,
			"name":	"RV",
			"network":	null,
			"service":	null,
			"type":	"tibrv"
		}],
	"users":	[{
			"description":	"Administrator",
			"name":	"admin",
			"password":	null
		}, {
			"description":	"Main Server",
			"name":	"EMS-SERVER",
			"password":	null
		}, {
			"description":	"Route Server",
			"name":	"EMS-SERVER2",
			"password":	null
		}]
}

NFS client setup

On the “Secondary” VM :

  • Disable the NFS server by removing the export entries in /etc/exports and restart the service
  • Remove the local configuration and stores
    • rm -Rf /opt/cfgtibco/tibco/cfgmgmt/ems
  • Create a (test) NFS client
    • mkdir /opt/cfgtibco/tibco/cfgmgmt/ems
      # as root, we mount the "root" 
      mount -t nfs4 -v 10.0.2.4:/opt/cfgtibco/tibco/cfgmgmt/ems /opt/cfgtibco/tibco/cfgmgmt/ems
  •  Once it is established to work successfully, umount the NFS partition and add this line to /etc/fstab:
  • 10.0.2.4:/opt/cfgtibco/tibco/cfgmgmt/ems /opt/cfgtibco/tibco/cfgmgmt/ems   nfs    defaults 0 0

GEMS update

One thing remaining is to update GEMS (see first tutorial) configuration:

We enter the URLs like typical java clients: tcp://host1:port1,tcp://host2:port2
We enter the URLs like typical java clients: tcp://host1:port1,tcp://host2:port2

Then, to be sure, let’s restart both VMs, validate the automatic NFS link, and start both servers before starting GEMS:

...bingo ! 2 EMS servers in FT via NFS sharing !
…bingo ! 2 EMS servers in FT via NFS sharing !

I hope this testing rig will be useful to you !

3 thoughts on “TIBCO EMS 8 Fault Tolerance + NFS4”

  1. I’ve get error when try to start secondary ems.
    what I do wrong?

    Reading configuration from '/opt/tibco/cfgmgmt/ems/data/tibemsd.json'.
    Configured as fault tolerant secondary.
    FATAL: failed to open log file '/opt/tibco/cfgmgmt/ems/data/datastore/logfile'.

    1. Hi Andrei, What does the commands “ls /opt/tibco/cfgmgmt/ems/data/datastore/” and “ls /opt/tibco/cfgmgmt/ems/data/datastore/logfile” return ?

Comments are closed.