Основные понятия
Automated feature engineering can improve downstream predictive performance by automatically creating new features that capture complex interactions between existing features. The proposed IIFE algorithm uses interaction information to efficiently identify and combine feature pairs that synergize well in predicting the target.
Аннотация
The paper introduces a new automated feature engineering (AutoFE) algorithm called IIFE (Interaction Information Based Automated Feature Engineering). The key idea behind IIFE is to use interaction information, a measure of synergy between two features and the target, to guide the feature engineering process.
The algorithm works as follows:
- Compute the interaction information for all pairs of features.
- Combine the feature pairs with the highest interaction information using a set of bivariate functions.
- Evaluate the performance of the new engineered features using cross-validation.
- Add the best performing engineered feature to the feature pool.
- Repeat steps 1-4, including the new engineered feature in the next iteration.
This iterative process allows IIFE to build increasingly complex features by combining the most synergistic pairs of features, while avoiding the combinatorial explosion of the feature space.
The authors demonstrate that IIFE outperforms existing AutoFE algorithms on a variety of public datasets and a large-scale proprietary dataset. They also show that interaction information can be used to accelerate other expand-reduce style AutoFE algorithms by reducing the search space.
Additionally, the authors identify and address several experimental setup issues in the existing AutoFE literature, such as the use of cross-validation scores instead of held-out test sets, and the use of transductive learning in the OpenFE algorithm.
Статистика
The dataset has on the order of thousands of features and hundreds of thousands of samples.
The Jungle Chess dataset has 44,819 samples and 6 features.
The Airfoil dataset has 1,503 samples and 5 features.
Цитаты
"Automated feature engineering attempts to automate the feature engineering process and allow general data science practitioners to benefit without requiring expert domain knowledge and time-consuming manual feature creation and testing."
"Interaction information is a way to calculate how well different feature pairs synergize in predicting a target."
"We demonstrate that interaction information can be successfully incorporated into other expand-reduce AutoFE algorithms to accelerate these algorithms while maintaining similar or better downstream test scores."