Cassandra Administrator and Developer Interview Questions-27

Question-131: Why do you want to create Secondary indexes?

Answer: By creating Secondary indexes you can filter tables for data stored in a non-primary key column. However, it is recommended that you use materialized view, if different key columns are needed. Because non primary key column does not play any role in ordering the data on storage level (until materialized view or new table is created with this new key columns), So if you are creating a table with the Secondary indexes which are non-primary key columns and require scanning all the partition. Imagine the read latency if you have to scan all the partition.

Secondary indexes can be built for a column in a table. These indexes are stored locally on each node in a hidden table and built in using background process. If a query include both a partition key condition and a secondary index column condition, the Quarry will be successful because the query can be directed to a single partition.

Question-132: What is ALLOW FILTERING condition while query the database?

Answer: If a secondary index is used in a query that is not restricted to a particular partition key, query will have prohibitive read latency because all nodes will be queried. And this query is allowed only if the query option “ALLOW FILTERING” is used. This not at all recommended in production environment.

Question-133: What sequence is followed by read request to return a data by the Cassandra storage engine?

Answer: The database process data at several stages on the read request, first it has to discover the final copy of the data which needs to be returned. It checks various places in order below

Starting from the Memtable
Then check the row cache
If it is not found go to the Bloom filter
Still not found then check partition index and try to find the partition offset in the chunk cache.
If required data is not present in the cache, pull it from disk.
Read the data from the uncompressed junk cache
If the required data is not present in the cache

Locate the data on disk using the compression offset map.
Fetch the data from the SSTable on the disk and place it in the chunk cache.

Question-134: What is row cache?

Answer: In Cassandra database if row cache is enabled it can store a subset of the partition data stored on disk to in memory. Remember this row cache is not stored on the Java heap memory, rather it is fully off heap memory. Which helps in releasing the garbage collection pressure of JVM. Read intensive database where 95% time read load and 5% is write load then enabling the row cache is a good idea. Row cache size is configurable.

Question-135: What happened when data is stored in the row cache, and for the same data new write request is received?

Answer: Remember row cache is not write through, if a write request comes for the row which is stored in the row cache. Then this cache would be invalidated and new data would not be cached until the row is read again. Similarly, if a partition is updated, the under partition is removed from the cache.

Details: Category: Cassandra Database Administrator & Developer; Last Updated: 24 April 2021

Related Articles

Cassandra Administrator and Developer Interview Questions-27