Question 61: What you can do with the SQLContext object?

Answer: SQLCOntext is an entry point for the Spark SQL functionality, you create SQLContext object using SparkContext. Using SQLContext, you can create a DataFrame from an RDD, a Hive table, or a data source.

Question 62: Should we use the SQLContext or HiveContext for SQL functionality?

Answer: You should use HiveContext, whether you are accessing data from Hive or Impala or not. With the HiveContext you can access Hive or Impala tables represented in the metastore database.

Question 63: Can Hive SQL syntax be used with the Spark SQL?

Answer: Hive and Impala tables and related SQL syntax are interchangeable in most of the cases. Because Spark uses the underlying Hive infrastructure, with Spark SQL whatever DDL statements, DML statements, and queries using the HiveQL syntax. For interactive query performance, you can access the same tables through Impala using impala-shell or the Impala JDBC and ODBC interfaces.

Question 64: In Spark-shell, how do you access the HiveContext?

Answer: In spark-shell, HiveContext is already created and available as a sqlContext (not SQLContext). However, in your spark application, you have to explicitly create the object of HiveContext using SparkContext as below.

sqlContext = HiveContext(sc)

 

Question 65: What is the minimum requirement (of the node from where Spark application submitted), if you are using Cloudera Infrastructure for the Spark application?

Answer: If you are using CDH cluster, then the host from where you are submitting applications, or runs spark-shell or pyspark then you must have Hive Gateway role defined in Cloudera Manager and client configurations deployed. Also remember, if Spark application uses the Hive view then it must have permissions to read that data else it will get empty result.