Question 86: When is the cluster mode is not suitable for the Spark application?

Answer: Cluster mode is not well suited for the application, which require to be interactive for example if application require user input, such as spark-shell and pyspark. Hence, driver program needs to run as part of client application process, which initiates the Spark application.

Question 87: What is the client deployment mode?

Answer: When application submitted in this mode, Spark driver will run on the host where the Job is submitted. The ApplicationMaster is responsible only for requesting executor containers from YARN. After the container start, the client communicates with the containers to schedule work.

Question 88: How do you run Spark shell on YARN?

Answer: To run spark-shell or pyspark client on YARN, use the --master yarn --deploy-mode client, when you start the application.

Question 89: How can you monitor and debug the Spark application submitted over YARN?

Answer: To obtain information about Spark application behavior you can consult YARN logs and the Spark web application UI.

Question 90: What advantages you will get when you consider Java/Scala for running your Spark application, rather than Pyspark?

Answer: Accessing Spark with Java and Scala offers many advantages: platform independence by running inside the JVM, self-contained packaging of code and its dependencies into JAR files, and higher performance because Spark itself runs in the JVM. You lose these advantages when using the Spark Python API.