Question-116: What is time window compaction strategy and when it should be used?

Answer: Time window compaction strategy is well suited for the time series data where the data is stored based on the time like in stock market data that is good to store data based on the business date on that the transaction happened. So, if your data is in sequence based on the time then this a good strategy. Using this strategy, the SSTable which has overlap of time window would be merged together to create a single SSTable.

 

Question-117: Why it is possible that duplicate outdated record is still exist on a particular node even though compaction has executed on another node?

Answer: Compaction processes runs independently on each node, if compaction is done on one node it won't affect other node SSTable. All the records where the compaction has been executed are latest but other node could have still outdated data. 

Question-118: What is the use of time to live (TTL) property?

Answer: When a row or column has time to live property attached, and as soon as specified time ends, the storage engine marks this record with a Tombstone and handle it like other Tombstone records. 

Question-119: What is the purpose of Tombstone marker?

Answer: Tombstone is a marker for a record or column which indicates that this record is marked for deletion and during next compaction cycle this record would be deleted from the SSTable. 

Question-120: How does storage engine handle the Tombstone record which is replicated across multiple nodes?

Answer: It is fine if all the nodes are live when Tombstone record is deleted, then it will delete the same record from other nodes as well if exist. But let's assume the scenario where if other node which is having the same record which is marked as tombstone in other node and that node is not alive during the deletion process. Node comes back after sometime when Tombstone record is already deleted from the online nodes, then the record on the node which was down because of outage, would be considered as a new record, Thus, it would be propagated to rest of the node in the cluster. These kind of deleted but persistent record is called as Zombie record.