Node is a server/system where data is stored.
A Collection of nodes are called as Data center.Data center may be physical or virtual.Datacenters should never span physical locations.
Cluster contains one or more data centers which can span physical locations.
All database write operations are written to commitlog which can be used for crash recovery.
Once after the Data written in commitlog, the same will be written in Mem-table for temporarily.
Once the Mem-table reached the threshold, data will be flushed to SSTable which is stored on disk sequentially and maintained for each and every Cassandra table.
A communication protocol to discover, share location and information about other nodes in cluster. This information will be persisted by each node in cluster to use immediately when a node restarts.
Partitioner is responsible to distribute the data across the nodes in cluster and decides on which node the first copy/replica of data to be placed.
It is the total number of replicas across the cluster.
Replication factor 1 ===> one copy of each row on one node.
Replication factor 2 ===> two copies of each row which are placed on different nodes.
Replica Placement Strategy:
There are 2 types of strategies.
1. Simple Strategy
when you have One Data center, this strategy place the first replica on the node selected by partitioner and remaining replicas in clockwise direction.
2. Network Topology Strategy
This is used when we have more than 2 data centers. This strategy places multiple replicas in each data center.