Question-96: Is it possible that same node can participate in two different Cassandra cluster?

Answer: No, a single physical node can participate only with the single Cassandra cluster. And you have to explicitly define the name of the Cassandra cluster in the Cassandra.yaml file. Which make sure that your node does not join another cluster if present on the same network.

Question-97: What all are the directories should be configured in Cassandra.yaml file?

Answer: There are various directories that needs to be configured for different purposes as below.

  • Data file directory: This is used where the table data is stored on the disk.
  • Commit log directory: the directory where the commit logs are stored, for better performance it is recommended that place the commit log on a separate disk partition or Ideally a separate physical device from the data file directories. As commit log is append only so we can use hard disk drive (HDD).
  • CDC Raw directory: The directory where the change data capture of commit log segments is stored during the flush.
  • Hints directory: This is the directory where the hints are stored.
  • Saved cache directory: the directory location where table key and Row caches are stored.

Question-98: Cassandra storage engine uses to read before write strategy, is it true?

Answer: No, this is generally followed in relational database where consistency requirement is 100%, just after the write operation. As Cassandra is a distributed system and generally consistency is tuned based on the requirement so that read before write is not needed. Because having read before write can heavily increase the latency. As we have discussed previously storage engine groups insert and update in memory, and at a particular intervals data would be written to disk. Once data is written to SSTable, which is immutable and can never be overwritten. 

Hence, to read this data require combining immutable SSTables for getting the correct results while querying the database. If you want you can use light with transaction to check the state of the data before writing but this is not recommended at all.

Question-99: Which of the four steps are followed while writing the data in Casandra?

Answer: while writing the data in Cassandra until it got persisted on disk, it goes through the following four phases or storage.

  • Log the data in the commit log, which is sequential writing for all the write request on that node for a particular keyspace.
  • Then data will be written to the Memtable, which is an in-memory data structure.
  • Once Memtable reaches its threshold then it would be flushed to the disk. 
  • After flushing the data this is permanently stored on disk, which is known as SSTable. During the compaction process new SSTable can be created by merging existing SSTable.

Question-100: What all are the available options for flushing the table manually?

Answer: There are two options available if you want to flush the data manually and that can be done using the nodetool command as below

  • nodetool flush: 
  • nodetool drain: in this case Memtables are flushed without even listening whether there is any connection open within nodes or not. 

 

If you are planning to restart in node then it is recommended that you always flush the table before restarting the node which can helps reducing the overall replay time.