Question 81: When you submit the Spark application, with the configuration, what is the precedence order is followed?

Answer: The order of configuration properties are as below.

  1. Properties passed to SparkConf.
  2. Arguments passed to spark-submit, spark-shell, or pyspark.
  3. Properties set in spark-defaults.conf.

Question 82: While submitting your Spark application, you are getting “Task not serializable” exception, what is the reason and resolution for it?

Answer: Because of a limitation in the way Scala compiles code, some applications with nested definitions running in an interactive shell may encounter a Task not serializable exception. Cloudera recommends submitting these applications.

Question 83: What are the advantages of running Spark on YARN cluster manager?

Answer: There are many advantages of running Spark on YARN cluster manager, as below

  1. You can dynamically share and centrally configure the same pool of cluster resources among all frameworks that run on YARN.
  2. You can use all the features of YARN schedulers for categorizing, isolating, and prioritizing workloads.
  3. You choose the number of executors to use; in contrast, Spark Standalone requires each application to run an executor on every host in the cluster.
  4. Spark can run against Kerberos-enabled Hadoop clusters and use secure authentication between its processes.

Question 84: What all steps are followed when you submit the Spark Application on YARN cluster manager?

Answer: Spark orchestrates its operations through the driver program. When the driver program is run, the Spark framework initializes executor processes on the cluster hosts that process your data. The following occurs when you submit a Spark application to a cluster:

  1. The driver is launched and invokes the main method in the Spark application.
  2. The driver requests resources from the cluster manager to launch executors.
  3. The cluster manager launches executors on behalf of the driver program.
  4. The driver runs the application. Based on the transformations and actions in the application, the driver sends tasks to executors.
  5. Tasks are run on executors to compute and save results.
  6. If dynamic allocation is enabled, after executors are idle for a specified period, they are released.
  7. When driver's main method exits or calls SparkContext.stop, it terminates any outstanding executors and releases resources from the cluster manager.

Question 85: When you run Spark Application on YARN cluster mode, what happens?

Answer: When you submit Spark application to YARN cluster, the Spark Driver program will run in the ApplicationMaster on a YARN cluster hosts. A single process in a YARN container is responsible for both driving the application and requesting resources from YARN. The client that launches the application does not need to run for the lifetime of the application.