Question-23: Any other or particular sections you want me to focus?

Answer: These are the common area and you must keep in mind

  • You will not get too many questions from RDD programming but for sure 2 to 4 questions you will be getting on RDD.
  • You must know the partitioning and shuffling, how to avoid shuffling. What is narrow transformation? And what is wide transformation? Expect 3-4 questions from this section as well.
  • You must know the impact of User Defined function when you write in Python and Scala. How performance does is impacted. What must be taken care for the UDAF (User Defined Aggregate function etc).
  • We have detailed chapter on how to create UDF and UDAF, sure shot 3-4 questions on this section. This SparkSQL training would help for this.
  • How to read data formats like Parquet, ORC, CSV, JSON, XML and AVRO. You must know various options for reading the data using DataFrameReader object. What all options are available for reading such data?
  • Once you read and process the data, how would save this data like in HDFS, S3, local file system. There are various options and syntax, you must know them.
  • For reading and writing assume you will get 6-8 questions in total.
  • Sure shot 2 question from GraphFrame. In this (Scala and Python Spark) simulator we have covered very well for the GraphFrame questions.
  • Spark Machine Learning Library question, you will get around 2 question from this section.
  • Major updates have been done for following two components in Spark 2.x
  • You must know these two section in detail 4-6 question is expected on that.
  • What is the use of Encoder (Sterilization and Deserialization) , learn this to answer Spark question.