spark|stion: What do you think, about its internal im

Mobile: +91-8879712614 Phone:022-42669636 | Email : hadoopexam@gmail.com admin@hadoopexam.com

All Products Spark IBM MapR Hortonworks Cloudera NiFi Amazon AWS SAS HBase Cassandra Salesforce Oracle Cloud & Java Android To Activate Free Resources Forum Subscribe Annual Subscription (50%+49% off) Author/Trainer For Business Blog

25000+ Learners upgraded/switched career Testimonials

All Certifications preparation material is for renowned vendors like Cloudera, MapR, EMC, Databricks,SAS, Datastax, Oracle, NetApp etc , which has more value, reliability and consideration in industry other than any training institutional certifications.
Note : You can choose more than one product to have custome package created from below and send email to hadoopexam@gmail.com to get discount.Premium Trainings Courses : HadoopExam focuses on in depth learning with the hands-on session setting up the environment than executing solution and doing hands on that. Below are the available trainings and we are keep adding new trainings. These trainings is being used and subscribed by Devloper, Tester, Administrator, Enterprise(to train their team) and Trainer globally. These trainings are well organized and step by step solutions to learning, and in lesser time as per your convenience you can complete these and even re-visit as required.

All Premium Training Access Annual Subscription (You will get early access to under development training and early edition books) : Used By More than 20000 subscribers

Access All Annual/Semi Annual/Quarterly Subscription from this Link

Previous | Next | Audio Book for Spark Interview Questions is available here | Top 150 Latest Spark Interview Questions | Quickly go through Spark Training Python & Scala

Question: What do you think, about its internal implementation of structured streaming, so that it can have same API?

Answer: The main important point here is, how the programming model is implemented in Spark structured streaming. In structured streaming, it consider live data as a table, which is continuously appended and code/program you write will be batch like only, hence Spark will query that data on that table, however queries are executed incrementally on this unbounded data table. For each query run, its like running query on static data. Every new data, you can assume like that new rows are being appended to the existing unbounded table.

Question: What is event-time data, and what is the use?

Answer: Whenever, you receive data, it may or may not contain time embedded with the message contents. If time is embedded then it is called event-time data. Suppose you want to calculate how many events are generated in last 5 minutes. It may be possible that events received are in different order or duplicates. System can use this event-time embedded in the message contents. Even sometime because of some reason like network failure, event received is quite late. System can use this embedded event time to get to know the exact time of events. It is very useful in IOT world.

Question: What is the use of Watermarking in structured streaming?

Answer: As we have discussed in previous question that events can be received in any order and time may be embedded with the event itself. In Spark 2.1, it is defined that you can specify watermark value, if message/event is older than this many seconds then discard it.

Question: How we can use DataFrame/DataSet API with the structured streaming?

Answer: Spark 2.0 onwards, DataFrame and DataSet API has been enhanced, so that they can consider static data, bounded data, streaming data, unbounded data same as for static data. So using common entry point of SparkSession can help you to work on streaming data as well by applying same operations/API.

Question: Can you give some example of Streaming data sources?

Answer: Spark provides some of the built in data sources, for the components which are quite popular and used ubiquitously for example.
- File: Any new file you receive in a directory can be considered as a stream of data.
- Kafka: Read data from Kafka messaging engine.
- Socket: Reading text data from socket (only support UTF-8 data) as well avoid using in prod, because it does not provide end-to-end fault tolerance.
- Rate: Generate fixed number of rows in every second.

Previous | Next | Audio Book for Spark Interview Questions is available here | Top 150 Latest Spark Interview Questions | Quickly go through Spark Training Python & Scala

Do you know?

Training Access: No time constraint and Any future enhancements on same and subscribed training will be free.
Question Bank (Online Simulator): Now you can have free updates for additional or updated Questions till your subscription is active.
On Mobile/Tablet/Desktop : You know this particular exam you can access from your mobile, tablet or Desktop. You just need internet access and browser.
Training Institute : Do you know many of the training institutes subscribe this products from HadoopExam to train their students.

Read all testimonials its learners voice : Testimonials

Disclaimer :
1. Hortonworks® is a registered trademark of Hortonworks.
2. Cloudera® is a registered trademark of Cloudera Inc
3. Azure® is aregistered trademark of Microsoft Inc.
4. Oracle®, Java® are registered trademark of Oracle Inc
5. SAS® is a registered trademark of SAS Inc
6. IBM® is a registered trademark of IBM Inc
7. DataStax ® is a registered trademark of DataStax
8. MapR® is a registered trademark of MapR Inc.