Question-9: You are working with a company which has an online portal for the commercial properties. And there are some elite customers have partnered who heavily invest in the commercial property to put them on the rent with the corporates. And want a feature on the website which can predict the price of the commercial property submitted on the website. You are using the various algorithm for predicting the prices for the property and one them is Decision Tree algorithm. However, you have found that during the model training phase it is overfitted and also does not able to generalize the data which is not yet seen and does not accurately predict the prices and same has been reported by the premium subscriber of your website. How can you handle this situation?
- You would be collecting more data to train your model and manually select the data for the better accuracy.
- You would be subscribing sample prices from the competitor website and use that for pricing the properties on your portal.
- Get Latest Certification Questions & Answer from this link, which is regularly updated as per recent syllabus.
- You would be using another machine learning algorithm to find the anomalies in your training data like K-means and then use only valid data with the decision tree to improve the prediction.
- Increase your memory " All AWS Certification & Training Material Can be accessed from this link as well " for your models.
Decision Tree vs Random Forest
Decision trees algorithm are used for the classification and regression problems. This false under supervised non-parametric machine learning algorithm. Decision trees are highly impacted based on how you split the data or decision.
In the decision tree we first split the feature with the highest information gain which is a recursive process till all child nodes are pure or until the information gain become zero. There are some disadvantages of decision tree algorithm like
- Decision trees are prone to overfitting.
- Gives most optimal solution but not globally optimal solution.
- Decision trees do not have same predictive accuracy compared to other regression and classification models
And to handle such issues, we can use the Random Forest Algorithm
Random forest increases the predictive power of the algorithm and also helps in preventing the overfitting and can be used for both Classification and regression. This is an ensemble of randomized decision trees. Each decision tree gives a vote for the prediction of target variable. Random forest choses the prediction that gets the most vote.
Random Forest algorithm can help with the
- Efficient on large datasets
- High predictive accuracy.
- Ability to handle multiple input features without need for feature deletion
- Prediction is based on input features considered important for classification.
- Works well with missing data still giving a better predictive accuracy
Disadvantages of random forest
- Not easily interpretable
- Random forest overfit with noisy classification or regression