Question-101: Explain what is the row key?

Answer: Row key is defined by the application. As the combined key is pre-fixed by the rowkey, it enables the application to define the desired sort order. It also allows logical grouping of cells and make sure that all cells with the same rowkey are co-located on the same server.

Question-102: Explain deletion in Hbase? Mention what are the three types of tombstone markers in Hbase?

Answer: When you delete the cell in HBase, the data is not actually deleted but a tombstone marker is set, making the deleted cells invisible.  HBase deleted are actually removed during compactions. Three types of tombstone markers are there:

  • Version delete marker:For deletion, it marks a single version of a column
  • Column delete marker:For deletion, it marks all the versions of a column
  • Family delete marker:For deletion, it marks of all column for a column family

Question-103: Explain how does HBase actually delete a row?

Answer: In HBase, whatever you write will be stored from RAM to disk, these disk writes are immutable barring compaction. During deletion process in HBase, major compaction process delete marker while minor compactions don’t. In normal deletes, it results in a delete tombstone marker- these delete data they represent are removed during compaction. Also, if you delete data and add more data, but with an earlier timestamp than the tombstone timestamp, further Gets may be masked by the delete/tombstone marker and hence you will not receive the inserted value until after the major compaction.

Question-104: Explain what happens if you alter the block size of a column family on an already occupied database?

Answer:  When you alter the block size of the column family, the new data occupies the new block size while the old data remains within the old block size. During data compaction, old data will take the new block size.  New files as they are flushed, have a new block size whereas existing data will continue to be read correctly. All data should be transformed to the new block size, after the next major compaction.

Question-105: What is Bloom filter and how it helps?

Answer: HBase supports Bloom Filter to improve the overall throughput of the cluster. A HBase Bloom Filter is a space-efficient mechanism to test whether a StoreFile contains a specific row or row-col cell. Without Bloom Filter, the only way to decide if a row key is contained in a StoreFile is to check the StoreFile's block index, which stores the start row key of each block in the StoreFile. It is very likely that the row key we are finding will drop in between two block start keys; if it does then HBase has to load the block and scan from the block's start key to figure out if that row key actually exists.