Hadoop Interview Questions-15

Question-71: What is the BlcokCache ?

Answer: HBase also use the cache where it keeps the most used data in JVM Heap, alongside Memstore. The BlockCache is designed to keep frequently accessed data from the HFiles in memory so as to avoid disk reads. Each column family has its own BlockCache The “Block” in BlockCache is the unit of data that HBase reads from disk in a single pass. The HFile is physically laid out as a sequence of blocks plus an index over those blocks. This means reading a block from HBase requires only looking up that block’s location in the index and retrieving it from disk. The block is the smallest indexed unit of data and is the smallest unit of data that can be read from disk.

Question-72: BlcokSize is configured on which level?

Answer: The block size is configured per column family, and the default value is 64 KB. You may want to tweak this value larger or smaller depending on your use case.

Question-73: If your requirement is to read the data randomly from HBase User table. Then what would be your preference to keep block size?

Answer: Smaller.

Having smaller blocks creates a larger index and thereby consumes more memory. If you frequently perform sequential scans, reading many blocks at a time, you can afford a larger block size. This allows you to save on memory because larger blocks mean fewer index entries and thus a smaller index.

Question-74: What is a block, in a BlockCache ?

Answer: The “Bock” in BlockCache is the unit of data that HBase reads from disk in a single pass. The HFile is physically laid out as a sequence of blocks plus an index over those blocks.

This means reading a block from HBase requires only looking up that block’s location in the index and retrieving it from disk. The block is the smallest indexed unit of data and is the smallest unit of data that can be read from disk. The block size is configured per column family, and the default value is 64 KB. You may want to tweak this value larger or smaller depending on your use case.

Question-75: While reading the data from HBase, from which three places data will be reconciled before returning the value?

Answer: Reading a row from HBase requires first checking the MemStore for any pending modifications.

Then the BlockCache is examined to see if the block containing this row has been recently accessed.

Finally, the relevant HFiles on disk are accessed.

Note that HFiles contain a snapshot of the MemStore at the point when it was flushed. Data for a complete row can be stored across multiple HFiles.

In order to read a complete row, HBase must read across all HFiles that might contain information for that row in order to compose the complete record.

Details: Category: Hadoop; Last Updated: 24 April 2021

Related Articles

Hadoop Interview Questions-15