Question-3: You are working with a company which has an e-commerce website, which continuously generates the web clickstream data which you wanted to store in the Redshift cluster, but before sending to the Redshift cluster you want to process data using your custom complex logic. Which of the following is a better solution for the given requirement?
- You would be using Kinesis Firehose to deliver the data in the redshift cluster.
- You would be using Lambda (For custom logic) and Kinesis Firehose to deliver the data to Redshift cluster.
- You would be using Kinesis Stream and Kinesis Client Library and before sending the data to redshift cluster you would configure the custom logic to process the data.
- You would be using Amazon SQS to send the data Redshift cluster and using the AWS Lambda you would be processing the data.
Exp: In this question you need to understand Get the latest AWS Training, Certification Preparation Material, Books & Interview questions the difference between Kinesis Firehose and Kinesis Data Stream. See the below differences to understand further. Both Kinesis Client library and Kinesis Firehose helps in ingesting data in S3, Redshift, Elastic Search, EMR, and AWS Lambda. Then what exactly is the difference in which scenario we should use which one. Few differences are below based on which you have to select correct answer
- Firehose is fully managed, scales automatically and stream needs to be manually managed
- In Kinesis Stream applications are build using the Kinesis Producer Library which put the data into a stream and then process it with application that uses the Kinesis Client Library and using the Kinesis Connector Library which send the processed data to S3, Redshift and DynamoDB etc.
- With the Kinesis Firehose it is simple, where we need to create the delivery stream and send the data to S3, Redshift etc. And you should have Kinesis Agent or API for that.
- Kinesis data stream can keep data for 7 days; hence it can be used as a storage as well. Which helps in custom processing before ingesting data to S3, Redshift or Elastic Search.
- Kinesis data stream is open-ended service at both the end on the producer side you will be configuring data producer to write the data in the Kinesis Stream, and this service will store your data in a continuous manner and able to replay as well, and order would be retained and on the other side, we would be configuring the data consumer to read the data out of the stream and process it with the custom application. Kinesis data stream is a data storage system, it is more flexible and you can build your custom application as you want, even you have full control how to partition your data, how many shards you want to have for your particular stream.
- Kinesis Firehose: It is an open ended only one side, you configure the you configure data producer to continuously push data into the Firehose Delivery Stream and on the other side you don’t read the data from FireHose delivery stream and you don’t write any application for that. Firehose automatically deliver the data to your destination like S3, Elastic Search, Redshift cluster etc.