Question 26: You have developed a JEE (Java Enterprise) web based application which is hosted on total 5, EC2 instances. Website is using NGINX as well as WebLogic app server. And to apply analytics and finding any

vulnerabilities your downstream team needs to log files generated by this application. Few of the analysis is done in real-time, hence it is required that you stream the logs in real-time. Which of the below would be

an ideal design to get the log data and also a good solution for storing this log data?

1. You will be writing a custom shell script which will run on schedule (Every minute) basis to read data from log files and deliver the same, to Kinesis Data Stream and from there they will be stored

in DynamoD

2.

You will be writing a Lambda function which will run on schedule (Every minute) basis to read the data from log files and send the data to Kinesis Data Stream and from there they will be stored in DynamoD

3. You will be writing a custom shell script which will run on schedule (Every minute) basis to read the data from log files and send the data to Kinesis Data Stream and from there they will be stored

in MySQL database.

4. You will be using AWS Kinesis Agent on the EC2 instances on which log files are generated, which will send batch (Every 1000 log events) to the Kinesis Data Stream and from there it will be stored

in AWS DynamoDB table for analytics.

5. You will be using AWS Kinesis Agent on the EC2 instances on which log files are generated, which will send log events to the Kinesis Data Stream as and when they are generated. And from there it

will be stored in AWS Redshift for further analysis.

Correct Answer : 5 Exp : Question is simply asking, how you would collect the log data which is generated by the web application hosted on EC2 instance and it should be done in real-time.

AWS has a solution for that, you will be installing Kinesis Agent which is a Java based software, which can collect and send data to Kinesis Data Stream (Data Consumer). Kinesis agent will continuously monitor a set

of files and sends new data to stream. It is the responsibility of the agent to handle the file rotation, checkpointing, and retry upon failure. It delivers the data in reliable and timely manner. It also emits data

to CloudWatch metrics and from there you can monitor and troubleshoot the streaming process.

Next is storage layer, as data volume is quite high you can use either DynamoDB or Redshift cluster. As you need to apply analytics as well on the data. So it is recommended that you use the Redshift cluster for

storing log data stream. Because it provides various in-build analytical function.

5