Question 111: How do you define a cluster manager?

Answer: Cluster Manager is an external service, which will help you on acquiring resources on the cluster like Spark standalone or YARN.

Question112: Which are the common compression techniques are used by Apache Hadoop?

Answer: Commonly used compression tool within Apache Hadoop are gzip, bzip2, Snappy, and LZO

Question 113: How do you define dataset in Spark?

Answer: We might have already covered that Question previously, but this Answer will be add on that

Dataset is a collection of records, similar to a relational database table. Records are similar to table rows, but the columns can contain not only strings or numbers, but also nested data structures such as lists, maps, and other records.

Question 114: How do you define the driver on Spark?

Answer: In Apache Spark, a process that represents an application session. The driver is responsible for converting the application to a directed graph of individual steps to execute on the cluster. There is one driver per application.

Question 115: Define the executor process in Apache Spark?

Answer: A process that serves a Spark application. An executor runs multiple tasks over its lifetime, and multiple tasks concurrently. A host may have several Spark executors and there are many hosts running Spark executors for each application.