Feature Engineering
Fundamentals of Machine Learning for NHS using R
Today’s Plan
- Feature Engineering
- Feature engineering techniques including but not limited to: transformations, feature extraction, reduction and selection.
Feature Engineering
Categorical variables
- Also known as creating dummy variables.
- Categorical features are replaced by one or more new features which can have numerical values.
- Any number of categories can be represented by introducing one new feature per category.
Feature Engineering
Categorical variables
Feature Engineering
Categorical variables
When to one-hot encode and dummy encode
- Both types of encoding can be used to encode ordinal and nominal categorical variables.
- However, if you strictly want to keep the natural order of an ordinal categorical variable, you can use label encoding instead.
Advantages & disadvantages
- One advantage of label encoding is that it does not expand the feature space at all as we just replace category names with numbers.
- The major disadvantage of label encoding is that machine learning algorithms may consider there may be relationships between the encoded categories.
- For example, an algorithm may interpret Premium (2) as two times better than Good (1).
Feature Engineering
Continuous variables
- Binning takes a numeric predictor and pre-categorises or “bins” it into two or more groups.
- Binning can be used to create new features or can be used to simply categorise features as they are.
- There could be certain drawbacks to binning continuous data when being used as features for ML models. There can be a loss of precision in the predictions when the predictors are categorised.
Feature selection
- Removing predictors - fewer predictors mean less computational time and complexity.
- Collinearity - the situation where a pair of predictor variables have a substantial correlation with each other.
- Remove if: two predictors are highly correlated, discard one?
- We shouldn’t just blindly follow the correlation rule. We can use the highly correlated features to create new features.
Feature selection
- These methods reduce the data by generating a smaller set of predictors.
- They capture the majority of the information in the original variables.
- Fewer variables can be used that provide reasonable fidelity to the original data.
- For most data reduction techniques, the new predictors are functions of the original predictors.