Question 31: What is the use of sync() and seek() method of the SequenceFile?
Answer: Using seek() method, reader can position to pointer to a given point in the file. However, please note that if position is not a record boundary, then calling next() method on that will fail. Hence, it is always required that you synchronized with the record boundaries.
Sync() method, is another way to find a record boundary, as soon as you call the sync() method on sequence file reader will connect to the next sync point (Learn more from Module-7).
Question 32: Which is the API method is available in Spark to load sequence file?
Answer: You can use sparkcontext’s sequenceFile(key,value) method to load sequence file. Both key and value datatypes are subclass of the Hadoop writable interface. Similarly to save sequence file you can use below RDD’s method rdd.saveAsSequenceFile(“path_to_hadoopexam_sequence_file”)
Question 33: What is Kryo?
Answer: This is a serialization framework, usually if you use Java’s default serialization mechanism then it is quite slower. Hence, other serialization frameworks are available and Spark can support them. One of this is Kryo. So when you work with the object file, you should consider using Kyro.
Question 34: which of the FileSystem are supported by the Spark out of the box?
Answer: HDFS, S3, Local file system
Question 35: While connecting to HDFS filesystem, which information you need for the Spark API?
Answer: We need two main information name node URL and port
- NameNode URL. Port and file path for example “hdfs://hadoopexam.com:8020/data/he_file.txt”