Question-11: What do you mean by replication factor?

Answer: To answer this question we need to understand how the Cassandra database is designed. In ideal situation Cassandra database is placed across multiple data center and there are 3 copies of data is distributed across the nodes in the cluster. However, number of copies are decided based on the replication factor. And only one copy holds by each node. And the remaining two copies are kept on two different nodes in the cluster. Which can be in a different rack or different datacenter. So even one of your nodes goes down then still there are two copies of the data would be available. Cassandra will immediately create third copy, once it found that there are data loss. And that can be achieved using Gossip protocol.

Question-12: what is the purpose of gossip protocol in Cassandra cluster?

Answer: in Cassandra cluster each node should be aware about another node in the cluster. And gossip protocol is used, to quickly share information about other nodes or any other metadata.  

Question-13: what is the role of the commit logs in the Cassandra? 

Answer: commit log is always reside on the disk. So, whenever data is written on particular node, then first it would be written to commit log sequentially. Which remains durable, even in case of node crash, data can be replayed from the commit log. 

Question-14: Is there a single commit log for each table?

Answer: For each node there is one commit log and data written on any table would go in the same commit log on that particular node. Purpose of the commit log is to keep the data durable in case node goes down and once node is back, it can be replayed again.

Question-15: What is Memtable and SSTable how they are related?

Answer: Whenever data is written in Cassandra cluster, following 3 storage are involved

  1. Commit log 
  2. Memtable
  3. SSTable

Once write request is received, the first place Cassandra write the data in a commit log file, and persisted on the disk. In case of failure, data can be replayed from the commit log. Which saves you from the data loss. Once data is persisted in commit log, it would also write the data in Memtable, this is an in-memory data structure. Each table would have separate Memtable data structure where data is stored based on the partition key and index. Once the size of the Memtable reaches the defined limit it would flush the data in SSTable. Again, SSTable is a storage on the disk, which has the same format of the data as in the Memtable. SSTable is also known as sorted string table data file.