This book is included as part of Premium & Pro Subscription as well in below certification

Please visit below links for subscriptions

You can always create custom package to include multiple products from all available products and get Discount : Send your requirement at hadoopexam@gmail.com for the same

Permium & Pro Subscription  | All Producrts |   CRT020 : Databricks  Spark Scala Certification


Below Topics will be covered in all 200 pages

Topic-1: Spark Architecture Components

Candidates are expected to be familiar with the following architectural components and their relationship to each other:

  • Driver
  • Executor
  • Cores/Slots/Threads
  • Partitions

Topic-2: Spark Execution

Candidates are expected to be familiar with Spark’s execution model and the breakdown between the different elements:

  • Jobs
  • Stages
  • Tasks

Topic-3: Spark Concepts

Candidates are expected to be familiar with the following concepts:

  • Caching
  • Shuffling
  • Partitioning
  • Wide vs. Narrow Transformations
  • DataFrame Transformations vs. Actions vs. Operations
  • High-level Cluster Configuration

DataFrames API

  • Candidates are expected to have a command of the following APIs.

Topic-4: SparkContext

Candidates are expected to know how to use the SparkContext to control basic configuration settings such as spark.sql.shuffle.partitions.

Topic-5 : SparkSession

Candidates are expected to know how to:

  • Create a DataFrame/Dataset from a collection (e.g. list or set)
  • Create a DataFrame for a range of numbers
  • Access the DataFrameReaders
  • Register User Defined Functions (UDFs).

Topic-6 : DataFrameReader

Candidates are expected to know how to:

  • Read data for the “core” data formats (CSV, JSON, JDBC, ORC, Parquet, text and tables)
  • How to configure options for specific formats
  • How to read data from non-core formats using format() and load()
  • How to specify a DDL-formatted schema
  • How to construct and specify a schema using the StructType classes

Topic-7: DataFrameWriter

Candidates are expected to know how to:

  • Write data to the “core” data formats (csv, json, jdbc, orc, parquet, text and tables)
  • Overwriting existing files
  • How to configure options for specific formats
  • How to write a data source to 1 single file or N separate files
  • How to write partitioned data
  • How to bucket data by a given set of columns

Topic-8: DataFrame

  • Have a working understanding of every action such as take(), collect(), and foreach()
  • Have a working understanding of the various transformations and how they work such as producing a distinct set, filtering data, repartitioning and coalescing, performing joins and unions as well as producing aggregates
  • Know how to cache data, specifically to disk, memory or both
  • Know how to uncache previously cached data
  • Converting a DataFrame to a global or temp view.
  • Applying hints

Topic-9: Row & Column

Candidates are expected to know how to work with row and columns to successfully extract data from a DataFrame

Topic-10: Spark SQL Functions

When instructed what to do, candidates are expected to be able to employ the multitude of Spark SQL functions. Examples include, but are not limited to:

  • Aggregate functions: getting the first or last item from an array or computing the min and max values of a column.
  • Collection functions: testing if an array contains a value, exploding or flattening data.
  • Date time functions: parsing strings into timestamps or formatting timestamps into strings
  • Math functions: computing the cosign, floor or log of a number
  • Misc functions: converting a value to crc32, md5, sha1 or sha2
  • Non-aggregate functions: creating an array, testing if a column is null, not-null, nan, etc
  • Sorting functions: sorting data in descending order, ascending order, and sorting with proper null handling
  • String functions: applying a provided regular expression, trimming string and extracting substrings.
  • UDF functions: employing a UDF function.
  • Window functions: computing the rank or dense rank.

Hadoop Annual Subscription








       

      Recommended Package for  Certification with the Training








      Click to View What Learners Say about us : Testimonials

      We have training subscriber from TCS, IBM, INFOSYS, ACCENTURE, APPLE, HEWITT, Oracle , NetApp , Capgemini etc.


      One of testimonials from training subscriber :

      I really enjoy all the training you provide, so do you have any training on Data Science? I searched in the website could not find one, I would be happy to join if you send me the link.

      Thanks,
      A**tha

      Repeat Customer email :
      Hi

      I have gone through Apache scala and spark training videos. The concepts explained very well in depth. I would like to know following details 
      1. I am interested for on Training module of Pig and Hive. While checking  found that "Hadoop Professional Training" covers pig and hive modules but not found separately.  Can I get pig and Hive module access only ? or I need to go for complete "Hadoop Professional Training" ?
      2. In addition to that, I need inputs from you. I need to go for Cloudera certificate but while checking found CCD410 "Hadoop Developer" is obsolete so if I go for "MapR Hadoop Developer Certification", what is market value? is it good to go for this exam? then interested for "MapR Hadoop Developer Certification"  Simulator  also
      I would like to know the cost for above 1 + 2.

      Thanks
      Vip*l P*tel

      Read all testimonials its learners voice :
      Testimonials