About Apache Hive

About Apache Hive

You can store the data in HDFS and process it using the MapReduce framework. You or Data scientists, Data Analytics engineer and Data Engineer can not always write the MapReduce to fetch the data from HDFS. You would rather need some better solution to check the sample data or aggregated values, or applying analytics functions like max, mean, standard deviation, variance etc. But rather you need some better solutions to check sample data or applying analytics on the stored data. without putting effort for changing the file format or parsing the data.

To do all this stuff Apache Hive is used. Apache Hive provide us with SQL dialect to query the data stored in the HDFS file system. As many of the people already know the SQL query and people who had done engineering in computer science or in information technology then they are well versed with the SQL query language. Apache Hive uses the almost same or similar syntax of ANSI SQL but not the certified ANSI SQL.

Apache Hive converts the SQL query which you write in the MapReduce java code and process the data stored in HDFS file system. Hive is an alternative solution for Data warehouse for the data stored in the Hadoop Cluster. Keep in mind that Hive is good for static data, which does not change frequently. If data changes frequently then you should use some other solution like streaming solution etc. Also, you should not expect the result would be returned immediately even for the small datasets stored in the HDFS or Hadoop cluster. Hive query takes time to process the data.

Question: Is Hive a Database solution?

Answer: No, Hive is not a database solution, its an engine to query the data stored in the Hadoop cluster or you can say in the HDFS file system.

Question: Can you insert, update or delete the record level data stored in the HDFS using Hive query?

Answer: No, you can not insert , update and delete the data stored in the HDFS using the record level query.

Question: Can i use the transactions with the Hive?

Answer: No, Hive does not support the transactions.

Question: Can you give an example for which Hive is best suited?
Answer: Hive is best suited for the DataWareHouse applications, where the volume of the data is very high or large. And you want to mine that big volume of data. Even we can create the Data Pipeline using the Apache Hive.

So above three questions about the Hive clarify that the Hive is not a Database solution. But you can write the queries to fetch the data. Which is stored in the HDFS.

You map be interested in the following learning material