Question 76: When you submit the Spark application, with the configuration, what is the precedence order is followed?
Answer: The order of configuration properties are as below.
- Properties passed to SparkConf.
- Arguments passed to spark-submit, spark-shell, or pyspark.
- Properties set in spark-defaults.conf.
Question 77: While submitting your Spark application, you are getting “Task not serializable” exception, what is the reason and resolution for it?
Answer: Because of a limitation in the way Scala compiles code, some applications with nested definitions running in an interactive shell may encounter a Task not serializable exception. It is recommended that submitting these as applications, so entire jar can be transferred over the worker node.
Question 78: What are the advantages of running Spark on YARN cluster manager?
Answer: There are many advantages of running Spark on YARN cluster manager, as below
- Sharing cluster resources dynamically: You can dynamically share and centrally configure the same pool of cluster resources among all frameworks that run on YARN.
- Scheduling: You can use all the features of YARN schedulers for categorizing, isolating, and prioritizing workloads.
- Executors: You choose the number of executors to use; in contrast, Spark Standalone requires each application to run an executor on every host in the cluster.
- Security: Spark can run against Kerberos-enabled Hadoop clusters and use secure authentication between its processes.
Question 79: What all steps are followed when you submit the Spark Application on YARN cluster manager?
Answer: Spark orchestrates its operations through the driver program. When the driver program is run, the Spark framework initializes executor processes on the cluster hosts that process your data. The following occurs when you submit a Spark application to a cluster:
- Launch Main Method: The driver is launched and invokes the main method in the Spark application.
- Resource Request: The driver requests resources from the cluster manager to launch executors.
- Launch Executors: The cluster manager launches executors on behalf of the driver program.
- Submitting tasks: The driver runs the application. Based on the transformations and actions in the application, the driver sends tasks to executors.
- Task Execution: Tasks are run on executors to compute and save results.
- Effect of dynamic allocation: If dynamic allocation is enabled, after executors are idle for a specified period, they are released.
- Stop SparkContext: When driver's main method exits or calls SparkContext.stop, it terminates any outstanding executors and releases resources from the cluster manager.
Question 80: When you run Spark Application on YARN cluster mode, what happens?
Answer: When you submit Spark application to YARN cluster, the Spark Driver program will run in the 0 on a YARN cluster hosts. A single process in a YARN container is responsible for both driving the application and requesting resources from YARN. The client that launches the application does not need to run for the lifetime of the application.