Rafah El-Khatib – Data Scientist at ING

Summary: Feature Selection Best Practices – LOFO and a Survey of Key Feature Importance Packages. Selecting predictive features to input into a model is key to ensuring that the input data is not noisy and is time-effective in cases where the original number of features or dataset are large. In this talk, I will present a survey of key feature importance packages and explain their strengths and weaknesses, and I will present an in-house open-source feature importance package called LOFO (leave-one-feature-out) and its fast approximation (FLOFO, or Fast LOFO). The LOFO importance calculates the importance of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, cross-validated, based on the chosen metric.