Question-7: I don’t see RDD mentioned in the syllabus, so they are not part of the certification?

Answer: Syllabus mentioned for the Databricks Spark certification is very abstract. And it is not given in detail what they will be asking in the exam. And we expect quite a good amount of questions based on the RDD and programming question will be asked for various RDD API. Reason RDD is still in focus, is because whether your use Spark1.x or Spark 2.x their underline processing engine works on RDD only. Hence, concepts of RDD must be cleared. And you wanted to apply some custom optimization, wanted to do performance tuning etc. then you should know how RDD works. Even if you are using distributed shared variables like Broadcast variable and Accumulators then you will be using RDD.

You can convert your DataFrame to RDD, as well as RDD to DataFrame with simple API. Hence, you must have a good experience with the Spark RDD programming as well. And it is expected you know while working with the Spark.