Question 81: When is the cluster mode is not suitable for the Spark application?

Answer: Cluster mode is not well suited for the application, which require to be interactive for example if application require user input, such as spark-shell and pyspark. Hence, driver program needs to run as part of client application process, which initiates the Spark application.

Question 82: What is the client deployment mode?

Answer: When application submitted in this mode, Spark driver will run on the host where the Job is submitted. The ApplicationMaster is responsible only for requesting executor containers from YARN. After the container start, the client communicates with the containers to schedule work.

Question 83: How do you run Spark shell on YARN?

Answer: To run spark-shell or pyspark client on YARN, use the --master yarn --deploy-mode client, when you start the application.

Question 84: How can you monitor and debug the Spark application submitted over YARN?

Answer: To obtain information about Spark application behavior you can for logging information go to YARN logs and the Spark web application UI.

Question 85: What advantages you will get when you consider Java/Scala for running your Spark application, rather than Pyspark?

Answer: As you know, Scala also uses the Java/JVM, and both has the same advantages. Also, best thing is Spark framework is written using Scala. So Spark with Java and Scala offers many advantages: platform independence by running inside the JVM, self-contained packaging of code and its dependencies into single JAR files, and higher performance because Spark itself runs in the JVM. You lose these advantages when using the Spark Python API.