KPL (Kinesis Producer Library) and consumed by the Kinesis Consumer Library (KCL). And KCL submitting this data to the Redshift Cluster from there Data Analytics team, read this logs data for applying Machine Learning
on it. Here, KCL (Kinesis Client Library) configured to poll the data every 100 milliseconds from the KPL. After few days new servers were added in the farm and KCL applications are getting lot of exceptions like
""ProvisionedThroughputExceededException"" errors. What is the issue here?
1. You have to decrease the polling interval.
2. You have to increase the polling interval.
3. You have to increase the memory of the KCL based application.
4. You have to use the bigger EC2 instances for the KCL based application.
Correct Answer : 2 Exp : Kinesis Data Stream can have 5 GetRecords per second, per shard. If you set the IdleTimeBetweenReadsInMillis property less than 200ms then it is possible that application can get
""ProvisionedThroughputExceededException"" And if there are too many of such exception than it can result in creating back-offs of the streamed logs messages and latency can increase.
Lets see another concepts
Propagation Delay: It is an end-to-end latency from the moment a record is written to the stream until it is read by a consumer application. And this delay depend on many factors, but it is primarily affected by the
polling interval of consumer application.
AWS recommends that polling each shard one time per second per application. This enables you to have multiple consumer applications processing a stream concurrently without hitting Kinesis Data Stream limits of 5
GetRecords calls per second. Similarly processing larger batches of the data tends to be more efficient at reducing network and other downstream latencies in your application.
Kinesis Data Stream records are available to be read immediately after they are written.
2