Cassandra: Replication factor and Consistency level

Cassandra: Replication factor and Consistency level
Photo by Andrew Coelho / Unsplash

Cassandra is well known NoSQL open-source database. It is highly available, scalable, and distributed. To maintain availability, Cassandra replicates data across different nodes. Let's explore its features that provide the control of data consistency and replication.

Data replication

Data replication means storing the same data on multiple nodes. Each row of data is stored in multiple copies. This way Cassandra ensures reliability and fault tolerance. Whenever a node is down, there is always the other one containing the copy. All replicas are equally important - there is no master replica. The total number of copies across the cluster is called Replication Factor.

💡
Replication Factor of 1 means that there is only one copy of each row in the cluster. If the node containing the row goes down, the row cannot be retrieved. Replication Factor of 2 means two copies of each row, where each copy is on a different node.

In general, Replication Factor should not exceed the number of nodes in the cluster. Replication Factor is defined along with Replication Strategy.

Replication strategy

Replication Strategy defines how data is replicated across all the nodes in the cluster. It is defined per keyspace and is set during keyspace creation. There are two main strategies:

  • SimpleStrategy: Use only for a single datacenter and one rack.
  • NetworkTopologyStrategy: Highly recommended for most deployments. Easily expandable to multiple datacenters.

There is also LocalStrategy dedicated to maintain system tables. It is for Cassandra internal use only and not replicated to other nodes.

cqlsh> DESCRIBE KEYSPACE system

CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;

Simple Strategy

SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location).

Here is an example of test1 keyspace creation in cqlsh using SimpleStrategy. Replication Factor is set to 2.

cqlsh> CREATE KEYSPACE test1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2} AND durable_writes = 'true';

Network Topology Strategy

NetworkTopologyStrategy can be used when you have your cluster deployed across multiple datacenters. This strategy specifies how many replicas you want in each datacenter. NetworkTopologyStrategy places replicas in the same datacenter by walking the ring clockwise until reaching the first node in another rack. NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.

In the following example, we create test2 keyspace in cqlsh using NetworkTopologyStrategy. There are two datacenters: datacenter1 and datacenter2. Replication Factors are set to 2 and 3 respectively.

cqlsh> CREATE KEYSPACE test2 WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 2, 'datacenter2': 3} AND durable_writes = 'true';

Common replication approaches

When deciding how many replicas to configure in each datacenter, the two primary considerations are: being able to satisfy reads locally, without incurring cross data-center latency, and failure scenarios.

The two most common ways to configure multiple datacenter clusters are:

  • Two replicas in each datacenter: This configuration tolerates the failure of a single node per replication group and still allows local reads at a consistency level of ONE.
  • Three replicas in each datacenter: This configuration tolerates either the failure of one node per replication group at a strong consistency level of LOCAL_QUORUM or multiple node failures per datacenter using consistency level ONE.

Consistency Level

The Cassandra Consistency Level is defined as the minimum number of Cassandra nodes that must acknowledge a read or write operation before the operation can be considered successful.

Level Write Read
ALL A write must be written on all replica nodes in the cluster for that partition. Returns the record after all replicas have responded. The read operation will fail if a replica does not respond.
EACH_QUORUM Strong consistency. A write must be written on a quorum of replica nodes in each datacenter. -
QUORUM A write must be written on a quorum of replica nodes across all datacenters. Returns the record after a quorum of replicas from all datacenters has responded.
LOCAL_QUORUM Strong consistency. A write must be written on a quorum of replica nodes in the same datacenter as the coordinator. Avoids latency of inter-datacenter communication. Returns the record after a quorum of replicas in the current datacenter as the coordinator has reported. Avoids latency of inter-datacenter communication.
ONE A write must be written at least one replica node. Returns a response from the closest replica, as determined by the snitch.
TWO A write must be written at least two replica node. Returns the most recent data from two of the closest replicas.
THREE A write must be written at least three replica node. Returns the most recent data from three of the closest replicas.
LOCAL_ONE A write must be sent to, and successfully acknowledged by, at least one replica node in the local datacenter. Returns a response from the closest replica in the local datacenter.
ANY A write must be written to at least one node. If all replica nodes for the given partition key are down, the write can still succeed after a hinted handoff has been written. If all replica nodes are down at write time, an ANY write is not readable until the replica nodes for that partition have recovered. -
SERIAL - Allows reading the current state of data without proposing a new addition or update. If a SERIAL read finds an uncommitted transaction in progress, it will commit the transaction as part of the read. Similar to QUORUM.
LOCAL_SERIAL - Same as SERIAL, but confined to the datacenter. Similar to LOCAL_QUORUM.

Quorums

The quorum is defined as the number of nodes that are required to acknowledge read or write. A quorum is calculated based on Replication Factors.

QUORUM is calculated as follows:

$$QUORUM = (\sum_{n=1}^{N}Replication Factor_n) / 2 + 1$$
💡
For the purpose of quorum calculation, let's assume we have 3 datacenter Cassandra cluster, and Replication Factors are defined as follows: 2 for datacenter1, 4 for datacenter2, and 6 for datacenter3.
$$QUORUM = (Replication Factor_{datacenter1} + Replication Factor_{datacenter2} + Replication Factor_{datacenter3}) / 2 + 1$$ $$QUORUM = (2 + 4 + 6) / 2 + 1 = 12 / 2 + 1 = 6 + 1 = 7$$

In this particular case, 7 nodes must respond to the coordinator node for the request to succeed.

LOCAL_QUORUM is calculated using the Replication Factor of the local datacenter that the coordinator node belongs to. When considering the cluster from the previous example, we will obtain 3 values of LOCAL_QUORUM, for each datacenter.

$$LOCALQUORUM_{datacenter1} = (Replication Factor_{datacenter1}) / 2 + 1$$ $$LOCALQUORUM_{datacenter1} = (2) / 2 + 1 = 1 + 1 = 2$$ $$LOCALQUORUM_{datacenter2} = (Replication Factor_{datacenter2}) / 2 + 1$$ $$LOCALQUORUM_{datacenter2} = (4) / 2 + 1 = 2 + 1 = 3$$ $$LOCALQUORUM_{datacenter3} = (Replication Factor_{datacenter3}) / 2 + 1$$ $$LOCALQUORUM_{datacenter3} = (6) / 2 + 1 = 3 + 1 = 4$$

Following LOCAL_QUORUM values were obtained:

  • a local quorum of 2 for datacenter1,
  • a local quorum of 3 for datacenter2,
  • a local quorum of 4 for datacenter3.

EACH_QUORUM means that a quorum of nodes for each datacenter must respond to consider the request to succeed. Taking the previous 3 datacenter Cassandra cluster as an example, the successful request would require an acknowledgment of 2 nodes from datacenter1 and 3 nodes from datacenter2, and 4 nodes from datacenter3. All in all, 9 nodes would have to confirm the request.

When comparing EACH_QUORUM to QUORUM we can see that QUORUM requires fewer nodes to confirm transaction - 9 versus 7 nodes.

Conclusion

Today we took a look at data replication in Cassandra as well as replication strategies used for that purpose. We learned about read and write consistency levels.