Fundamentals of Machine Learning for NHS using R
Asking if the character is male is the best first question, as it eliminates the most options, there are multiple hair colours, and not all the characters have hair!
However there is only the option of male or female in this case, and each character can be assigned to one or the other.
Here is the entire decision tree for Guess Who?
The plot shows the number of days spent doing a hobby, how many days they work, and if they are happy or sad. What yes or no questions would you ask to determine if someone is happy or sad?
Question 1: Do you spend less than 3 Days doing a hobby?
Do you think we could use the model to classify if they are happy or sad? Y - is the number of days spent in work. X - is the number days spent doing a hobby.
For some additional data, collected in the same way, can our model predict if these people were happy or sad.
Our model would suggest that these people were either happy or sad in the following way.
Estimate groupings based on model
Actual groupings from collected data
In our example, “Do you have a beard”, is no longer a question that gives any information, that being it brings us no closer to our answer.
Boosting samples data in the exact same way as Bagging, accept instead of having equal votes: \[ Votepower = Vote*f(accuracy) \] This just essentially means that Decision tree’s with a higher overall accuracy, have a higher overall vote score.
Questions?