In this post, I will be sharing my understanding of feature selection process:
In Theory feature selection can help in reducing the dimensionality, leading to simpler models that are easier to understand and interpret. Models with large number of features are also more prone to overfitting, especially if the dataset is not large enough to support the complexity. Simplifying the feature set can help in building more generalizable models. Overfitting is more likely to happen with a very large number of features, especially if some of those features do not contribute to the predictive power of the model. Models with fewer features are generally faster to train and require less computational resources.
But what about in Practice?
The curse of dimensionality isn’t meaningful in practice because out space isn’t just a bunch of meaningless cartesian coordinates. We create structure, using trees, neural nets, etc. We regularize using bagging, weight decay, dropout, etc. We find that therefore we actually can add lots of columns without seeing problems in practice. - Jeremy Howard
More complex models with a large number of features can capture more intricate/complex patterns in the data. In practice, modern machine learning techniques have developed various methods to mitigate the issues arising from high-dimensional spaces. Modern algorithms don't treat the feature space as a simple Cartesian space but instead learn non-linear and complex boundaries within it. This structuring allows them to handle high-dimensional data more effectively than traditional statistical models. Ensemble methods like bagging can help prevent overfitting effectively and reduce the model's complexity, making it less sensitive to the curse of dimensionality. Although, assumption here is sufficiently large and diverse dataset to train on.
While theoretically, a large number of features might pose challenges, in practice, if a model with many features consistently performs better on a well-constructed validation set, it's a strong argument in favor of using more features. Domain knowledge can also guide which features are likely to be relevant, which might not be immediately apparent through algorithmic feature selection techniques alone.
Performing Experiments for Feature Selection:
This is beneficial when you suspect that not all features contribute equally to the predictive power of the model, or when you need to build a simpler, more interpretable mode. In practice, a common approach is to start with a model using all available features and then iteratively refine the feature set based on model performance and domain knowledge. This iterative approach helps in understanding the contribution of different features to the model and in finding a balance between model complexity and performance.
In summary, while theoretical considerations about feature selection and the curse of dimensionality are important, practical machine learning often involves empirical testing and the application of advanced techniques to mitigate these issues.