Question 16: What is the advantage of broadcasting values across Spark Cluster?

Answer: Spark transfers the value to Spark executors once, and tasks can share it without incurring repetitive network transmissions when requested multiple times.

 

Question 17: Can we broadcast an RDD?

Answer: Yes, you should not broadcast a RDD to use in tasks and Spark will warn you. It will not stop you, though.

 

Question 18: How can we distribute JARs to workers?

Answer: The jar you specify with SparkContext.addJar will be copied to all the worker nodes.

 

Question 19: How can you stop SparkContext and what is the impact if stopped?

Answer: You can stop a Spark context using SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime Environment and effectively shuts down the entire Spark application.

 

Question 20: Which scheduler is used by SparkContext by default?

Answer: By default, SparkContext uses DAGScheduler , but you can develop your own custom DAGScheduler implementation.