Question-46: What do you mean by Feature Engineering?

Answer: This is the step where you need to convert your data in the form where it can be understood by Machine Learning algorithms. Like converting data features into numerical features. Again, this needs to be done with carefulness. Feature engineering include following steps

  • Data Normalization.
  • Adding variables to represent the interactions of other variables.
  • Manipulating categorical variables.
  • Changing into the format which can be accepted by Machine Learning Model.

And to manipulate your data, you have to use various Machine Learning statistical techniques.

Question-47: Apache Spark MLib Machine Learning Library, in what format input should be provided?

Answer: In case of Apache Spark MLib or Machine Learning Library, all variables usually has to be input as vectors of doubles. Hence, you have to convert your data features accordingly.  

Question-48: What all you do while Model Training step?

Answer: During the model training parameters inside the model will change according to how well the model perform on the input data. Suppose you are building a Spam classification model, then the algorithm which you will be using likely to find that certain words are better predictors of spam than others and therefore weight the parameters associated with those words higher. And once your model is trained, we find that certain words should have more influence because their consistency associated with spam emails then others.

And output of this step we would have a Machine Learning model. This model you would use for future predictions.

Question-49: How do you know that the model we trained is a good or not?

Answer: For that you use Model Tuning and Model evaluation step.

Question-50: What is the general reason because of which Machine Learning Model does perform poorly?

Answer: The cause of poor performance of Machine Learning Algorithm is because of either Overfitting or underfitting of the data.