Question 12: You required huge EMR cluster almost 200 nodes on everyday basis to run and complete a

Question 12: You required huge EMR cluster almost 200 nodes on everyday basis to run and complete a MapReduce job in an Hour. You can setup such cluster keeping in mind that it should be very optimal for the cost

perspective as well as job must be finished in an hour. Which is the best solution for the given requirement from the below options?

A. You will be using the lowest bid on daily basis and create cluster using Spot instances.

B. To save the cost you will reserve the instance for maximum possible period. And create the EMR cluster using this reserved nodes.

C. You will be using instance fleet configuration for creating the EMR cluster.

D. You will always use 100 reserve instances and 100 spot instances. So average cost will be maintained.

E. You will be using combination of On-Demand and spot instances for core and task nodes.

1. A,B

2. B,C

3. C,D

4. C,E

5. A,E

Correct Answer : 4 Exp : Now you need to setup cluster such a way that cost is minimum and you need to run cluster only for one hour each day. Then why would you reserve instances for that which un-necessary increase

the cost. Hence, option-B cannot be correct.

It is not possible that so many instances you can always get at the lowest bid price on daily basis. Hence, we can say option-A cannot be an ideal solution. Similarly option-D why would you want to reserve 100

instances, which can increase the cost. Hence, option-D cannot be a correct option.

Now option remain is C and E and check below concept from AWS documentation

Using the instance fleet configuration in EMR cluster you can provision various options from EC2 instances like what is the target capacity for On-Demand instances and Spot instances in each fleet. So while launching

the cluster EMR provisions instances until specified target is fulfilled. You can specify up to five EC2 instance types per fleet for EMR to use when fulfilling the targets. You can also select multiple subnets for

different Availability Zones. When Amazon EMR launches the cluster, it looks across those subnets to find the instances and purchasing options you specify.

While a cluster is running, if Amazon EC2 reclaims a Spot Instance because of a price increase, or an instance fails, Amazon EMR tries to replace the instance with any of the instance types that you specify. This

makes it easier to regain capacity during a spike in Spot pricing. Instance fleets allow you to develop a flexible and elastic resourcing strategy for each node type. For example, within specific fleets, you can have

a core of On-Demand capacity supplemented with less-expensive Spot capacity if available, and then switch to On-Demand capacity if Spot isnt available at your price.

Summary of Key Features:

One instance fleet, and only one, per node type (master, core, task). Up to five EC2 instance types specified for each fleet.

Amazon EMR chooses any or all of the five EC2 instance types to provision with both Spot and On-Demand purchasing options.

Establish target capacities for Spot and On-Demand Instances for the core fleet and task fleet. Use vCPU or a generic unit assigned to each EC2 instance that counts toward the targets. Amazon EMR provisions instances

until each target capacity is totally fulfilled. For the master fleet, the target is always one.

Choose one subnet (Availability Zone) or a range. Amazon EMR provisions capacity in the Availability Zone that is the best fit.

When you specify a target capacity for Spot Instances:

For each instance type, specify a maximum Spot price. Amazon EMR provisions Spot Instances if the Spot price is below the maximum Spot price. You pay the Spot price, not necessarily the maximum Spot price.

Optionally, specify a defined duration (also known as a Spot block) for each fleet. Spot Instances terminate only after the defined duration expires.

For each fleet, define a timeout period for provisioning Spot Instances. If Amazon EMR cant provision Spot capacity, you can terminate the cluster or switch to provisioning On-Demand capacity instead.

Hence, option C and E are correct.

Details: Category: AWS Certified Big Data - Specialty; Last Updated: 30 November -0001

Related Articles