Question 6: Which all are daemon process are part of the standalone cluster manager?

Answer: As we have mentioned, standalone cluster manager is not good for the production use cases. You should consider this only for testing and POC purpose. So, when you start Spark with the standalone cluster manager then it creates the two daemon processes.

  • Master Node
  • Worker Node

Question 7: How do you define the Worker Node in Spark cluster?

Answer: You can say that, they are slave nodes, and actual your processing on data happens on worker node. Worker node always communicate with the Master Node and inform about the availability of the resources. Generally on each node, in your cluster one worker node is started, which is responsible for running your application on that particular node and also monitor that application.

Question 8: What is the executors in Spark?

Answer: Each Spark application, you submit has its own executors process, executors exists only until your application runs. Driver program uses these executors to run the tasks on the worker nodes and also help in keeping data in memory and when required can be spilled on disk.

Question 9: What is the Task in Spark?

Answer: Task is the unit of work, which is sent to the executor running on the worker node. It is a command which will be sent to the executor by a driver program by serializing your Function object. It is the responsibility of the executor process to de-serialize this Function object and execute on the data which is partitioned (one of the RDD partition) and exists on that node.

Question 10: Define some use of the SparkContext object?

Answer: It is one of the entry point for your Spark cluster, and once you have hold of SparkContext you can create new RDDs from existing data, can create accumulators, and broadcast variables on that cluster. Since Spark 2.0, a more general entry point for Spark system is done through SparkSession object.