Question-111: What do you mean by consistency?
Answer: Consistency is a phenomenon which make sure or let you know that all the replicas of the data in the cluster are synchronized at what level. If all replica immediately synchronizes then this called highly consistent database like your most RDBMS databases, which generally have only one copy of data so there is no need of synchronization and make the database highly consistent. But in case of Cassandra type of data bases where all the replica might not be immediately synchronizing and you can tune that synchronization mechanism using defining consistency level. But after sometimes all the replicas would be in sync and it would take some time to get this done because of that reason Cassandra is also known as eventually consistent database. Because at the end all the replica would be in sync.
Question-112: How does Cassandra maintain that compaction process remain efficient and highly performant?
Answer: Compaction is a process of merging SSTable, this process collects all version of each unique row and assemble one complete row, based on the latest timestamp on each row. Merging of SSTable is highly performant because rows are sorted by partition key within each SSTable and merge process should not have to use random input output.
Question-113: While doing the compaction is there any performance effect on disk?
Answer: Compaction process causes a temporary spike in the disk uses, as well as disk IO. When both old and new SSTable coexist. It depends on which compaction strategy to use.
Question-114: What is size tiered compaction strategy?
Answer: This strategy is good for writing intensive workload, but it can hold the data for long term and as the time goes requirement for memory increases. In this strategy compaction process initiate when the database has set number of similar sizes SSTable. For example, on Cassandra node four SSTables are created which has the same size or similar size then this compaction process is initiated and all the SSTable are merged into one large SSTable. This strategy is good for heavy write load, but can impact the read performance. Because SSTable can grow until another similar size of SSTables are created.
Question-115: What is leveled compaction strategy?
Answer: Leveled compaction has various advantages like disk requirement easy to predict, similarly read operation latency is predictable and data is evicted more frequently.
Disadvantage of this strategy is that it has much higher IO utilization. In this strategy there are various series of levels let's see few levels
- At first data from Memtable is flushed to SSTable that is you can see first level as L0.
- If there is a larger SSTable exist and SSTable created at L0, it would be merged with the existing large SSTable in level L1.
- This all merging of SSTable is depending on what configuration you have defined.
- Using level compaction, it is possible that your SSTable size can grow in terabytes and you need to split back that SSTable. And that is done using SSTable split command.
- During the node Bootstrap compaction operation is bypassed if LCS is configured because original data is moved directly to the correct level reason there is no existing data and no partition overlap per level is present.