Question-15: You are working with the dataset which you are receiving as a feed from the one of the data vendors and you want to write an ETL job for that data. This data is from one of eCommerce website and your company provides the Machine Learning and Data Science solution to the customer. The data you are receiving is being saved in S3 bucket, now you want to create or find the schema for this data hence you defined the AWS Glue Crawler and you provided your custom classifier as well. And when your Crawler completed it is unable to generate the schema, what could be the possible reason for the same?
- Your custom classifier returned certainty of 0 and it is not using the default classifier.
- Get Latest Certification Questions & Answer from this link, which is regularly updated as per recent syllabus.
- You have not defined the output bucket.
- Your data is in unstructured format
Ans : B
Exp: A classifier reads the " All AWS Certification & Training Material Can be accessed from this link as well " data in a data store. If it recognizes the format of the data, it generates a schema. The classifier also returns a certainty number to indicate how certain the format recognition was.
AWS Glue provides a set of built-in classifiers, but you can also create custom classifiers. AWS Glue invokes custom classifiers first, in the order that you specify in your crawler definition. Depending on the results that are returned from custom classifiers, AWS Glue might also invoke built-in classifiers. If a classifier returns certainty=1.0 during processing, it indicates that it's 100 percent certain that it can create the correct schema. AWS Glue then uses the output of that classifier.
If no classifier returns a certainty greater than 0.0, AWS Glue returns the default classification string of UNKNOWN.
You use classifiers when you crawl a data store to define metadata tables in the AWS Glue Data Catalog. You can set up your crawler with an ordered set of classifiers. When the crawler invokes a classifier, the classifier determines whether the data is recognized. If the classifier can't recognize the data or is not 100 percent certain, the crawler invokes the next classifier in the list to determine whether it can recognize the data.
If AWS Glue doesn't find a custom classifier that fits the input data format with 100 percent certainty, it invokes the built-in classifiers in the order shown in the following table. The built-in classifiers return a result to indicate whether the format matches (certainty=1.0) or does not match (certainty=0.0). The first classifier that has certainty=1.0 provides the classification string and schema for a metadata table in your Data Catalog.