HadoopExam Learning Resources


BigData | DataScience | IOT | Cloud | DevOps | ITRisk | AI |BlockChain 



    25000+ Learners upgraded/switched career    Testimonials

All Certifications preparation material is for renowned vendors like Cloudera, MapR, EMC, Databricks,SAS, Datastax, Oracle, NetApp etc , which has more value, reliability and consideration in industry other than any training institutional certifications.
Note : You can choose more than one product to have custome package created from below and send email to hadoopexam@gmail.com to get discount.

Do you know?
Hadoop Annual Subscription

Please note  : New version of this certification is available, please check this link (CRT020 : Databricks Certified Associate Developer for Spark 2.4 and Python-3 Assessment

 CRT020 : PySpark Databricks Certification 

Page Redirect

Latest Certification

You would be redirected to the latest Spark Certification page in few seconds


Databricks Certified Developer Apache Spark 2.x for Python (Cert No : PR000005)  : PySpark : This certfication is retired, for new version check above link.
  • 270 + Questions  and Answers for real exam
  • Simulator has around 65% programming questions
Since last 3 years in the BigData world, one of the fastest growing technology is Spark for sure. Every BigData solution provider has to adopt Spark technology on their platform whether its Cloudera, Hortonworks, MapR, IBM etc. All these companies knows the power of Spark and the way Spark had changed the BigData, Analytics and Data Science industry. At the same time Spark itself had changed a lot to make itself gold standard BigData technology and one big driver behind that is Databricks. Now Databricks its own launched a certification exam and they are conducting this exam on Spark 2.x (PySpark for Python) platform. This certification exam focus 7 main topics of Spark 2 platform, which are listed below. It seems there are only 7 topics to prepare for this certification exam, yes on macro level its true. But as you dive into each individual topic, you will find more granular things which can be asked in the certification exam. We also get to know that in certification exam more focus will be on coding section. Hence, 80% coding and 20% fundamental concepts of Spark 2 required. There are some core fundamentals which had not changed between Spark 1 and Spark 2. Hence, lot of questions will be based on those fundamentals as well. Hence, to prepare yourself for Spark 2.x (Python) certification HadoopExam brings 270+ questions with more focus on coding but also at the same time cover good fundamentals concepts as well. By completing this 270 questions, you will not only prepare yourself for the Databricks PySpark certification but making yourself master of Spark 2.0 framework as well. So now without waiting further subscribe for Spark 2.x Python certification simulator. This certification is going to be in limelight for 2018-2019. This certification is very useful for Developer, Data Analyst, Data Scientists and BigData tester. As many of developer already the the Python and Python has much richer library for Data analytics, which helps data scientists to use PySpark with their existing knwoledge and tools.

Below is the best combo for this certification (

1. Apache PySpark (Python) Professional Training (Core Saprk and Fundamentals)
2. PySpark(Python) Structured Streaming  Professional Training with HandsOn
3.  Databricks Certified Developer Apache Spark 2.x for Python (Cert No : PR000005)

PySpark : HandsOn Professional Training +PySpark Structured Streaming+ Databricks PySpark 2.x (Python Spark) Certification Exam   

Regular Price : $400
Offer Price: $209.00
 (Save Flat 50%) + Additional $41 discount for next 3 days = $159  

Note: If having trouble while credit card payment then please create PayPal account and then pay.
India Bank Transfer
Regular Price: 16949 INR
Offer Price: 8475INR (Save flat 50%) +18%GST  = 9999INR       Additional discount (2000INR) for next 3 days = 7999INR only
Click Below ICICI Bank Acct. Detail
Indian credit and Debit Card(PayuMoney)  

Subscribe for Full version of  only Databricks Certified Developer Apache Spark 2.x for Python (PySpark) (Cert No : PR000005) 

Databricks Certified Developer Apache Spark 2.x for Python (Cert No : PR000005)  

Download Trial Version

Contact Us After Buying To Download or Get Full Version  

Phone : 022-42669636
Mobile : +91-8879712614

Regular Price : $179
Offer Price: $89.00
 (Save Flat 50% )  
$69 (Limited Period only)
Note: If having trouble while credit card payment then please create PayPal account and then pay.
India Bank Transfer
Regular Price: 8999 INR
Offer Price: 3299INR (Save flat 50%) +18%GST = 3893INR  = 2999INR  (Limited Period only)
Click Below ICICI Bank Acct. Detail
Indian credit and Debit Card(PayuMoney)

Most subscribed Annual Package 

Hadoop Annual Subscription


Simulator Benefits Exam Syllabus
  • 2 Full length mock exams (270+ Questions)
  • Objective based Practice Tests
  • Exhaustive explanation with all required question
  • 100% Syllabus covered: All exam objectives
  • Gain confidence and reduce study time
  • Learn to manage your exam time effectively
  • Confirm that you are improving with every simulation
  • Learn to apply effective "test taking strategies
  • Created by Hadoop and Spark Developer certified professional
  • Detailed explanations for all required answers
  • Retake all exams as many times as you like
  • Based on the most current Databricks Spark 2.x PySpark Certification Guide
Below are some sample topics, which are being asked. This list is not enough there is much more in HadoopExam simulator.
  1. Spark Basics : RDD, Core Spark etc
  2. Spark Streaming : Processing real-time data
  3. Spark Architecture : How entire Spark is build, various optimization engine
  4. Spark ML : Machine Learning Concepts
            • Classificatioin
            • Clustering
            • Dimensionality Reduction
            • Collaborative filtering
            • Supervised/Un-Supervised etc
  5. Spark Performance and Debugging : Optimizing Spark execution engine with various parameters
  6. Spark SQL :  Working with structred data like
    • DataFrame
    • Dataset
  7. GraphFrames: Creating,finding subgraph, finding patterns, motif expression and much more
    • Graph algorithm use case like
    • Breadth-first search (BFS)
    • Connected components
    • Strongly connected components
    • Label Propagation Algorithm (LPA)
    • PageRank
    • Shortest paths
    • Triangle count

Note : This product is tested only on Windows Operating System

* Please read faq section carefully.

Important things to know about Databricks Spark Certifications

Question-1: What is the Major change is done in Databricks Spark certification in latest certifications?

Answer: Databricks has changed a lot in their new release of Apache Spark certification. Below two are the major changes in the exam

  • There would be separate exam for Spark Scala and Spark Python. In future same is expected to be available in Java and R language.
  • Databricks have upgraded certification and will be testing on Spark 2.x platform of Apache Spark.
Question-2: Is Databricks test is on Databricks Enterprise platform or on the Apache Spark?

Answer: Databricks asks question based on the Apache Spark and not any other commercial platform.
Question-3: Should I consider or preferred Scala based Spark certification only, because Spark is written using Scala and heard that Spark Scala is faster that PySpark?

Answer: You should select certification based on your programming language skill. If you are from Java/Scala background then go for Scala based Spark certification. And if you know Python programming language than go for Python Spark certification.

With regards to performance: that was the case on older version of Spark where Scala Spark was better performant that PySpark. But in Spark 2.x this is not the case (Because of their Optimizer), whatever programming language you use either Java/Scala/Python/R all are same on performance. Only exception to this is User Defined Function.

Question-4: What is the name of current version of Spark certification?

Answer: Databricks Certificated Developer. Apache Spark V2.x
Question-5: What is the duration of the exam and number of questions?

Answer: Total duration is 180 Mins (3 Hrs) for the exam and no. of questions varies from 40-80. Based on their difficulty level.
Question-6: What all are the topics asked in the Databricks Spark certification exam?

Answer: You will be asked question on following topics

  • Basic concepts of Apache Spark
  • Spark Structured Streaming (Concepts + Programming questions)
  • Architecture of Apache Spark which include following topics
    • Driver Program
    • Cluster Manager
    • Client Mode vs Cluster Mode
    • Executors and Tasks
    • SparkSession
  • Spark Performance optimization and debugging the performance issues.
    • You must understand stages, tasks etc. concepts
    • You should be able to check where to use cache
    • What is the use of checkpointing
  • Spark SQL
    • You should be well versed to writing SQL queries using Spark SQL Platform
    • You should be able to use DataFrame/Dataset API
    • You should be able to select correct queries or code snippet for expected result
    • You should be able to select correct output based on the given code snippet
  • GraphFrame:
    • You should be able to answer various graph problems.
    • Lot of questions are already included in https://www.hadoopexam.com certification simulator
  • Deploying applications in Cluster
Question-7: I don’t see RDD mentioned in the syllabus, so they are not part of the certification?

Answer: Syllabus mentioned for the Databricks Spark certification is very abstract. And it is not given in detail what they will be asking in the exam. And we expect quite a good amount of questions based on the RDD and programming question will be asked for various RDD API. Reason RDD is still in focus, is because whether your use Spark1.x or Spark 2.x their underline processing engine works on RDD only. Hence, concepts of RDD must be cleared. And you wanted to apply some custom optimization, wanted to do performance tuning etc. then you should know how RDD works. Even if you are using distributed shared variables like Broadcast variable and Accumulators then you will be using RDD.

You can convert your DataFrame to RDD, as well as RDD to DataFrame with simple API. Hence, you must have a good experience with the Spark RDD programming as well. And it is expected you know while working with the Spark.

Question-8: In the syllabus they have mentioned only Streaming, so questions would be asked from Dstream or Structured Streaming?

Answer: You should give more focus on the Structured Streaming, because Dstream had some issue, that’s the reason they have created Structured Streaming platform. However, you should be knowing how Dstream works, because concepts are same for processing real time streaming data. However, we don’t expect too many questions based on Dstream. They would give more focus on the Structured Streaming.
Question-9: What programming languages questions are expected in real exam?

Answer: As we have mentioned earlier they can ask questions based on whatever programming language you have chosen for certification. So before purchasing the certification make sure what programming language you are using

  • Python
  • Scala
Question-10: Is there any book is available for preparing Spark certification in either Python or Scala?

Answer:  As of now there is not a single book available which focus on these Spark certifications. We highly recommend that you use below study material for the preparing respective certifications. These material is regularly updated and new things are added.

Spark Scala

Python Spark (PySpark)

Question-11: What about Spark Machine Learning Questions?

Answer: Yes, there will be few questions on Machine Learning. You should have some knowledge about Machine Learning to answer question on that.
Question-12: I have some feedback and information about the Databricks Spark certification which needs to be updated here, for the benefits of the other learners?

Answer: We always believe in helping in each other for the growth. Hence, we request you to please send whatever feedback you have on this email id hadoopexam@gmail.com or admin@hadoopexam.com we will update the same on this page.
Question-13: There are two library for processing graph data, GraphX and GraphFrame. Which one they are asking in the exam?

Answer: As you can see that they are focusing on the Spark 2.x version which has much better solution for Graph data processing using GraphFrame . Hence, focus would be on the GraphFrame.
Question-14: Should I know Machine Learning in depth to clear Spark certification exam?

Answer: This exam is not focused for the Data scientists. Hence, it is not expected that you will be asked in depth questions for Mache Learning. So we have enough coverage in HadoopExam certification simulator for the Machine Learning questions. So you should have some concepts clear for the Machine Learning.
Question-15: Why Spark technology in so much news?

Answer: From the Apache it is one of the most actively worked framework. In recent years BigData, Real time Data processing, Artificial Intelligence and many other things pushed high. And all this need a processing engine which can process such things efficiently. Even Hadoop Mapreduce which become suddenly popular, is replaced by Spark computation engine. There are almost more than 1000 contributors on the open source platform.

After Spark 2.0, it is very easy to learn. Its API is very intuitive as well if you are good at SQL queries then it make it much easier for you to learn. If you are a programmer than DataFrame/Dataset API would help you a lot for working with the Spark.

There are many organizations have pushed Spark applications in the production. Which proves the quality and reliability of the Spark framework.

Companies already having Hadoop cluster do not have to create separate Spark cluster. They can use their existing framework for the running Spark jobs on the same cluster. Whether it is written using Java/Scala/Python or R language.

Always having new technologies knowledge will give you the opportunity to draw more salary. And less chance of job loss. You can switch your career and Spark is one of them for sure.

Question-16: I have good knowledge of Spark, and almost 3+ years’ experience working with Spark, why should I go for certification?

Answer:  There is a myth in IT industry that certification does not help in career. This is not at all true. Having certification certainly helps in following ways

  • You will know all the hidden features of a technology. If you go for certification
  • It shows your career focus
  • While resume shortlisting it is given priority (Because first shortlisting is done by recruitment team, they don’t have enough knowledge about technology. Hence, they look for your credentials in the resume).
  • First impression on the interviewer.
  • Interviewer will focus on things which you have written in resume.
  • You will categorized in separate category.
  • It will give confidence during the interview and while working in the organization.
  • So avoid all the people who have –ve thinking about learning. Learning can never be costly and time wasting (universal truth).
  • Certainly and additional feather in your hat.
  • There are many other latent benefits for doing certification.
Questions-17: Who else conduct the Spark certification?

Answer: There are various other vendors who conduct certification on Spark as below

  • Cloudera Hadoop and Spark certification
  • Hortonworks Spark certification
  • MapR Spark Scala Certification
Question-18: Do you give priority to specific vendor?

Answer: No we don’t give priority to any vendor. It varies based on many factors.

  • Like if you wanted to get certified in both Hadoop and Spark then go for Cloudera Hadoop and Spark certification. And you have to have knowledge how to use Cloudera platform.
  • If you are working on MapR platform then you can go for MapR Spark certification. Even other advantage is that MapR Spark certification is not as lengthy as Databricks Spark certification. You can prepare for MapR Spark certification quite less time. You can see pross and cons that Databricks is more involved with the Spark and really tough one among the all Spark certification.
  • Hortonworks Spark certification: This is again Hands on certification for the Spark. And have limited syllabus and specific objectives are given.
Question-19: Can I use Learning Spark book, to prepare for the Databricks Spark certification?

Answer: It was a good book till Spark 1.x was exists. This book you should not use anymore, because it is outdated and not good fit for current Spark 2.x based certification.
Question-20: I don’t know both Scala and Python then which programming language you would recommend?

Answer: It is very tricky question to answer. We recommend learn both the programming language. These are beautiful language to work upon. But based on the following career path you can choose respective programming language.


  • Java programmer should go for this
  • If you want to become Data Engineer than go for this
  • If you want to work on Data Cleaning and collecting Data than go for this.
  • If you already know Java/Scala than go for this


  • If you know Python than go for PySpark.
  • If you are on Business Analytics profile go for PySpark
  • I want to become Data Scientist, you can use either PySpark or Scala Spark

It should not be considered based on the fact that Spark is written in Scala, so I should give preference to Spark Scala. Not at all true after Spark 2.x version.

Question-21: What is the current passing score?

Answer: Currently Databricks does not tell you the exact passing score. But you can expect 70%+ correct answer should be there to clear the exam. Each question carries different scoring based on the difficulty level.
Question-22: What is the fee for the Databricks certification and how many attempts we can have?

Answer: Fee is little high which is $300, but it gives two opportunity to clear the exam as of now. For latest information always check Databricks website.
Question-23: Any other or particular sections you want me to focus?

Answer: These are the common area and you must keep in mind

  • You will not get too many questions from RDD programming but for sure 2 to 4 questions you will be getting on RDD.
  • You must know the partitioning and shuffling, how to avoid shuffling. What is narrow transformation? And what is wide transformation? Expect 3-4 questions from this section as well.
  • You must know the impact of User Defined function when you write in Python and Scala. How performance does is impacted. What must be taken care for the UDAF (User Defined Aggregate function etc).
  • We have detailed chapter on how to create UDF and UDAF, sure shot 3-4 questions on this section. This SparkSQL training would help for this.
  • How to read data formats like Parquet, ORC, CSV, JSON, XML and AVRO. You must know various options for reading the data using DataFrameReader object. What all options are available for reading such data?
  • Once you read and process the data, how would save this data like in HDFS, S3, local file system. There are various options and syntax, you must know them.
  • For reading and writing assume you will get 6-8 questions in total.
  • Sure shot 2 question from GraphFrame. In this (Scala and Python Spark) simulator we have covered very well for the GraphFrame questions.
  • Spark Machine Learning Library question, you will get around 2 question from this section.
  • Major updates have been done for following two components in Spark 2.x
  • You must know these two section in detail 4-6 question is expected on that.
  • What is the use of Encoder (Sterilization and Deserialization) , learn this to answer Spark question.

Note : You can choose more than one product to have custome package created from below and send email to hadoopexam@gmail.com to get discount 

Click to View What Learners Say about us : Testimonials

We have training subscriber from TCS, IBM, INFOSYS, ACCENTURE, APPLE, HEWITT, Oracle , NetApp , Capgemini etc.

Books on Spark or PDF to read : Machine Learning with Spark, Fast Data Processing with Spark (Second edition), Mastering Apache Spark, Learning Hadoop 2, Learning Real-time Processing with Spark Streaming, Apache Spark in Action, Apache Spark CookBook, Learning Spark, Advanced Analytics with Spark Download.

WhatsApp |  Call Us | Have a Query ?  |  Subscribe