Question 11: How many SparkContext object, you should create per JVM?

Answer: You can create as many as you want, but it is always preferred that you create only one SparkContext object per JVM. So suppose, you have already created one of the SparkContext object in one JVM, you need to first stop it by calling stop() method on it. As we have mentioned previously, when you start spark shell (REPL) then by default one of the SparkContext object is created and assigned to a variable named sc. So you should not create new SparkContext object in that shell.

Question 12: Which object represents the primary entry point into the Spark 2.0 System?

Answer: SparkSession is the primary object, you should use Spark 2.0 onwards. Because it has all the capabilities which other specific objects have like SQLContext, HiveContext, SparkContext. In the REPL, Spark command line utility, it is available as a Spark object.

Question 13: When do you create new SparkContext object?

Answer: We should consider to creating new SparkContext object whenever you have standalone spark application written in either Java, Scala, Python or R language.

Question 14: Which script you will use to submit your standalone Spark application?

Answer: spark-submit script will be used to submit the Spark application.

Question 15: When do you use client and cluster mode to submit your application?

Answer: There are two deployment mode exists for the Spark application

  • Client Mode: You should use this mode when your gateway machine is collocated with the worker machine. In this mode driver application is launched as a part of the spark submit process and act as client for the cluster.
  • Cluster Mode: This is used when your application is far away from the worker machines, like your own laptop or computer which is not part of the Spark cluster.