Question-136: What is the Bloom filter?
Answer: Each SSTable has an associated Bloom filter, which can tell you whether SSTable has the requested data or not. This also tells you the likelihood of the partition data is stored in the SSTable. Bloom filter is always false positive and true negative. It means if it says data is available in this SSTable, but may not be available in that SSTable, but when it says, that data is not available in this SSTable, then it guarantees that data would not be available in that SSTable. It's a probabilistic function.
The Bloom filter is stored in off-heap memory, and can grow up to 2 gigabytes per billion partitions.
Question-137: What is partition index?
Answer: The partition index maps partition keys to a row.
Question-138: With respect to CAP theorem, what Casandra follows?
Answer: Cassandra database is highly available and partition tolerant database. It has a tunable consistency, based on your configuration and requirement you can tune consistency level in Cassandra database.
Question-139: What do you mean by tunable consistency?
Answer: consistency means all the replica should have synchronize and represent the same data. that is known as highly consistent data. As discussed previously Cassandra does not immediately replicate the data on all the node, generally it is taken care by a background processes which is known as NodeSync. NodeSync tries to sync all the replica. And eventually represent same data and latest data across all the nodes in the cluster even across the datacenter. If you need a highly consistent data then your latency would be affected.
Question-140: How consistency can be controlled in Cassandra database?
Answer: You can use consistency level which can be defined globally for a cluster or a datacenter. Consistency level can vary for individual read or write operation so that data returned is more or less consistent. Based on the client application requirement. So that Cassandra can behave like CP or AP system based on the requirement of your application.
There is a tradeoff between latency and consistency, high consistency would have higher latency, and lower consistency would have lower latency. It is not possible to tune a distributed database to completely CA (Consistent and highly available) system.