How to Setup a HA Cassandra Cluster

Apache Cassandra is a NoSQL database with flexible deployment options that’s highly performant (especially for writes), scalable, fault-tolerant, and proven in production. Alternative NoSQL databases include Amazon DynamoDB, Apache HBase, and MongoDB.

Assuming we have the following servers

Hostname             Public IP     Private IP
-----------         -----------    ----------
cassandra01         10.10.10.10     1.1.1.1
cassandra02         11.11.11.11     2.2.2.2
cassandra03         12.12.12.12     3.3.3.3

Install Cassandra on the first three servers

$ sudo apt update
$ sudo apt upgrade -y
$ sudo apt install openjdk-8-jdk apt-transport-https -y
$ java -version
$ sudo sh -c 'echo "deb http://www.apache.org/dist/cassandra/debian 40x main" > /etc/apt/sources.list.d/cassandra.list'
$ wget -q -O - https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
$ sudo apt update
$ sudo apt install cassandra
$ sudo systemctl enable cassandra

Adjust the /etc/cassandra/cassandra.yaml on the first three servers.

. . .

cluster_name: 'ProdCassandraCluster'

. . .

seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "1.1.1.1:7000,2.2.2.2:7000,3.3.3.3:7000"

. . .

listen_address: node_private_ip

. . .

rpc_address: node_private_ip

. . .

endpoint_snitch: GossipingPropertyFileSnitch

. . .

auto_bootstrap: false

After adjusting the three cassandra nodes, stop the three, delete system files and start them again

$ service cassandra stop
$ sudo rm -rf /var/lib/cassandra/data/system/*
$ service cassandra start

Running Behind HAProxy

Please note that: Datastax Cassandra drivers supports connection pooling and load balancing so they are recommended over HAProxy.

If you still want to run cassandra behind an HAProxy that’s part of the nodes private network, please follow the following steps:

$ apt-get install haproxy
$ service haproxy start

Edit HAProxy config file /etc/haproxy/haproxy.cfg and restart afterwards.

defaults
    mode    tcp

frontend stats
    bind *:8404
    stats enable
    mode http
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST

frontend cassandra-cql
    description "Cassandra CQL"
    bind *:9042
    mode tcp
    option tcplog
    default_backend cassandra-cql

backend cassandra-cql
    description "Cassandra CQL"
    balance leastconn
    mode tcp
    server 1.1.1.1 1.1.1.1:9042 check
    server 2.2.2.2 2.2.2.2:9042 check
    server 3.3.3.3 3.3.3.3:9042 check

You should be able to reach cassandra nodes through HAProxy (Public IP 13.13.13.13)

$ cqlsh 13.13.13.13 9042

HAProxy dashboard will be available via this link http://13.13.13.13:8404/stats

References