Question-96. Please let us know the Difference Between HBase and Hadoop/HDFS?
Answer: HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS is a distributed file system that is well suited for the storage of large files. Its documentation states that it is not, however, a general-purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed StoreFiles that exist on HDFS for high-speed lookups. Assumptions and Goals of HDFS
- Hardware Failure
- Streaming Data Access
- Large Data Sets
- Simple Coherency Model
- Moving Computation is Cheaper than Moving Data
- Portability Across Heterogeneous Hardware and Software Platforms
HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.
Question-97: What is the maximum recommended cell size?
Answer: A rough rule of thumb, with little empirical validation, is to keep the data in HDFS and store pointers to the data in HBase if you expect the cell size to be consistently above 10 MB. If you do expect large cell values and you still plan to use HBase for the storage of cell contents, you'll want to increase the block size and the maximum region size for the table to keep the index size reasonable and the split frequency acceptable.
Question-98: What happens if we change the block size of a column family on an already populated database?
Answer: When we change the block size of the column family, the new data takes the new block size while the old data is within the old block size. When the compaction occurs, old data will take the new block size. “New files, as they are flushed, will have the new block size, whereas existing data will continue to be read correctly. After the next major compaction, all data should be converted to the new block size.”
Question-99. Explain what is WAL and HLog in Hase?
Answer: WAL (Write Ahead Log) is similar to MySQL BIN log; it records all the changes occur in data. It is a standard sequence file by Hadoop and it stores HLogkey’s. These keys consist of a sequential number as well as actual data and are used to replay not yet persisted data after a server crash. So, in cash of server failure WAL work as a life-line and retrieves the lost data.
Question-100: In HBase what is column families?
Answer: Column families comprise the basic unit of physical storage in HBase to which features like compressions are applied.
 
											