Question-7: You are working in a web hosting company which manages the more than 10,000 webservers for supporting various websites. There is every hours various web server logs are generated which is managed to get stored in the S3, now you wanted to write some ETL to generate a partitioned data, and wanted to run regularly some SQL queries every 3 hours to find any hacking activities is being done on any of the webserver, if yes then they wanted to generate the report out of this. Which of the following is suitable solution for this because every hour around 300GB of the logs generated from all the servers?
- You would be using Kinesis FireHose and Kinesis Data Analytics
- Get the Latest AWS Certification Questions & Answer based on recent syllabus from this link
- You would be using Kinesis Data Stream and Kinesis Data Analytics
- You would be using Redshift cluster, Lambda and DynamoDB
Ans: B
Exp: As there is no need of having the Get the latest AWS Training, Certification Preparation Material, Books & Interview questions real time data processing so we can safely ignore the options provided like Kinesis Data Stream and kinesis Data analytics. As most of this processing needs to be done in the Batch every 3 hours and looking at the data volume we can say EMR is the Good fit, even we can use the Apache Hive to query the data stored in AWS S3. Hence, we can use the EMR as compute engine and S3 as a data storage and using the Hive we can query the data stored in S3, even we need to do any transformation then we can use the EMR MapReduce job as well, if needed.