Question-15: You are working with a company which has an online clothing store (ClothEra.com) and already having more than a million customer who regularly buy cloths from them and being an BigData solution architect you have been assigned a task to improve the overall customer experience by collecting the data in real time from the website clickstream. However, data generated is in quite high volume and various category. Once you receive the data it should be available in actual format at-least for 24 hours and then transformed data can be stored some persistence store for historical analysis. Having a more than 5000’s customer on average being online generates almost 100 events per second, however each even data size is less than 1KB. the Select the correct option from below for the given requirement
- You would be creating a single shared in Kinesis Data stream to collect the data
- Get the Latest AWS Certification Questions & Answer based on recent syllabus from this link
- You would be using Kinesis Firehose for collecting all the clickstream data and transforming them in real-time
- You can have more than one stream created in a single shard.
- A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes.
- You would be setting Data stream retention period as 24 hours once it is created.
- You would be storing data in the DynamoDB
- You would be using Elastic Cache to cache the data for 24 hours.
Answer: B, E, F, G
Exp: In the given question Get the latest AWS Training, Certification Preparation Material, Books & Interview questions the requirement is to collect more than 100 event logs per second and obvious choice for this is Kinesis Data Stream. Let’s understand the shard
A shard is a uniquely identified sequence of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.
If your data rate increases, you can increase or decrease the number of shards allocated to your stream.
Hence, we need to have at-least 20 shards (20X5=100) to support the 100 rad transactions per second. As Kinesis Data stream, data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob, which is an immutable sequence of bytes. Which makes option-2 & 5 as correct.
Data retention: The retention period is the length of time that data records are accessible after they are added to the stream. A stream’s retention period is set to a default of 24 hours after creation. You can increase the retention period up to 168 hours (7 days). Hence, you don’t need different caching solution. Which make option-6 as correct and option-8 as wrong.
And for the persistence we can use DynamoDB to store data permanently.