Question-171: When is a minor compaction Triggered?

Answer: Whenever an SSTable is added to the node through either flush or streaming process then usually minor compaction runs and this configured to be run every 5 minutes.  

Question-172: Why it is considered that merging of SSTables is an efficient process?

Answer: It is compaction process responsibility to merge SSTables, since partitions in SSTables are sorted based on the hash or partition key, it is possible to efficiently merge separate SSTables. Because content of each partition is also sorted, so each partition can be merged efficiently. 

Question-173: What is Tombstone in Cassandra?

Answer: When a delete request is received by Cassandra it does not actually remove the data from the underlying store. Instead it writes a special piece of data known as a Tombstone. This Tombstone represents the delete and causes all values which occurred before the Tombstone do not appear in query results. This approach is used for removing values because of the distributed nature of the database.

Question-174: What is the relation between “gc_grace_seconds” parameter and Tombstone removal?

Answer: There is a table level “gc_grace_seconds” parameter which control how long Cassandra will retain Tombstone record even after the compaction process. This duration should directly reflect the amount of time a user expects to allow before recovering of failed node. Once “gc_grace_seconds” expired the Tombstone may be removed, Tombstone can live in one SSTable and data it covers can be on another SSTable, compaction process must also include both SSTable for a tombstone to be removed. Practically for dropping a Tombstone following needs to be true

  • The Tombstone must be older than “gc_grace_seconds”.
  • If partition X contains the Tombstone, the SSTable containing the partition Plus all SSTables all containing data in older than the Tombstone containing X must be included in the same compaction.
  • If option “only_ purge_ repaired_ tombstones” is enabled, tombstones are only removed If the data has also been repaired.

Question-175: What happens if a node down or disconnected for longer than “gc_grace_seconds” to the deleted data?

Answer: if a node remains down or disconnected for longer than “gc_grace_seconds” its deleted data will be repaired back to the other nodes and reappear in the cluster. 

Question-176: What is TTL for Cassandra record or data?

Answer: Data in Cassandra can have an additional property that is called time to live (TTL), which is used to automatically drop data that has expired once the time is reached. Once the TTL the has expired the data is converted to a tombstone which stays around for at least “gc_grace_seconds”.