Download PDF of Apache Spark Interview Questions

Q31. How provenance repository does provides search capability?

Ans: Provenance repository uses the embedded Lucene search engine.

Q32. Can we re-process the FlowFile, which has already been processed and how?

Ans: Yes, we can re-process the FlowFile which had already been processed from Data Provenance repository. Replay button that allows the user to re-insert the FlowFile into the flow and re-process it from exactly the point at which the event happened. This provides a very powerful mechanism, as we are able to modify our flow in real time, re-process a FlowFile, and then view the results. If they are not as expected, we can modify the flow again, and re-process the FlowFile again. We are able to perform this iterative development of the flow until it is processing the data exactly as intended.

Q33. What is the use of Flow Controller?

Ans: The flow controller is the brains of the operation. It provides threads for extensions to run on, and manages the schedule of when extensions receive resources to execute. The Flow Controller acts as the engine dictating when a particular processor is given a thread to execute.

NiFi Professional Training with HandsOn : Subscribe Now

Q34. What is process group in NiFi?

Ans: Process group can help you to create sub data flow. Which you can add in your main dataflow. You can send and receive data from process group using output port and input port respectively. You can say it is a composition of the NiFi components to create a sub dataflow.

Q35. What is the difference between FlowFile and Content repository in NiFi?

Ans: The FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow.

The Content Repository is where the actual content bytes of a given FlowFile live.

Spark Professional Training   Spark SQL Hands Training   PySpark : HandsOn Professional Training    Apache NiFi (Hortonworks DataFlow) Training   Hadoop Professional Training   Cloudera Hadoop Admin Training Course-1  HBase Professional Traininghttp  SAS Base Certification Hands On Training OOzie Professional Training     AWS Solution Architect : Training Associate

Q36. How do you define NiFi content repository?

Ans: As we mentioned previously, contents are not stored in the FlowFile. They are stored in the content repository and referenced by the FlowFile. This allows the contents of FlowFiles to be stored independently and efficiently based on the underlying storage mechanism.

Q37. Does NiFi works as a master-slave architecture?

Ans: No, from NiFi 1.0 there is 0-master philosophy is considered. And each node in the NiFi cluster is the same. NiFi cluster is managed by the Zookeeper. Apache ZooKeeper elects a single node as the Cluster Coordinator, and failover is handled automatically by ZooKeeper. All cluster nodes report heartbeat and status information to the Cluster Coordinator. The Cluster Coordinator is responsible for disconnecting and connecting nodes. Additionally, every cluster has one Primary Node, also elected by ZooKeeper.

Q38. If you are working as a DataFlow Manager on a clustered NiFi setup, than which node you will be using to create dataflow?

Ans: As a DataFlow manager, you can interact with the NiFi cluster through the user interface (UI) of any node. Any change you make is replicated to all nodes in the cluster, allowing for multiple entry points.

Q39. How NiFi does guarantees the delivery of the messages?

Ans: This is achieved through effective use of a purpose built persistent write-ahead log and content repository.

Q40. If you need to do site-to-site deployment, then which all files would you configure?

Ans: To create site-to-site deployment, you have to do following configurations.

1. state-management.xml : to reflect my zookeeper instances

2. nifi.properties : for site-to-site properties and cluster properties

3. zookeeper.properties and authorizers.xml : to reflect the hostnames of all nodes