Node:

Node is a server/system where data is stored.

Data Center:

A Collection of nodes are called as Data center.Data center may be physical or virtual.Datacenters should never span physical locations.

Cluster:

Cluster contains one or more data centers which can span physical locations.

Commit log:

All database write operations are written to commitlog which can be used for crash recovery.

Mem-table:

Once after the Data written in commitlog, the same will be written in Mem-table for temporarily.

SSTable:

Once the Mem-table reached the threshold, data will be flushed to SSTable which is stored on disk sequentially and maintained for each and every Cassandra table.

Gossip:

A communication protocol to discover, share location and information about other nodes in cluster. This information will be persisted by each node in cluster to use immediately when a node restarts.

Partitioner:

Partitioner is responsible to distribute the data across the nodes in cluster and decides on which node the first copy/replica of data to be placed.

Replication factor:

It is the total number of replicas across the cluster.

Example:

Replica Placement Strategy:

There are 2 types of strategies.

1. Simple Strategy

when you have One Data center, this strategy place the first replica on the node selected by partitioner and remaining replicas in clockwise direction.

2. Network Topology Strategy

This is used when we have more than 2 data centers. This strategy places multiple replicas in each data center.