Apache Spark Interview Questions-27

Question 131: You are accessing data stored in AWS S3, in Spark and on bucket TLS (Data Encryption) is enabled, then what you need?

Answer: If S3 bucket is TLS enabled and you are using custom jssecacerts truststore, make sure that your trustore includes the root Certificate Authority (CA) certificates that signed the Amazon S3 certificate.

Question 132: If your Spark is installed on EC2 instances, and trying to access data stored in S3 bucket, but you don’t want to provide the credential at the same time, how can you do that?

Answer: As we know, both EC2 and S3 are Amazon services, we can leverage IAM roles (Learn Amazon AWS Solution Architect from here), in this mode of operation associates the authorization with individual EC2 instances instead of with each Spark app or the entire cluster.

Run EC2 instances with instance profiles associated with IAM roles that have the permissions you want. Requests from a machine with such a profile authenticate without credentials.

Question 133: Cloudera also provides a way to store AWS bucket credential to provide system-wide AWS access to a single predefined bucket, what is that?

Answer: Cloudera recommends that you use the Hadoop Credential Provider to set up AWS access because it provides system-wide AWS access to a single predefined bucket, without exposing the secret key in a configuration file or having to specify it at runtime.

Question 134: What are the ways, by which AWS bucket access can be controlled?

Answer: AWS access for users can be set up in two ways. You can either provide a global credential provider file that will allow all Spark users to submit S3 jobs, or have each user submit their own credentials every time they submit a job.

All Products List of www.HadoopExam.com

Details: Category: Spark; Last Updated: 24 April 2021

Related Articles

Apache Spark Interview Questions-27