pudu: A Python library for agnostic feature selection
and explainability of Machine Learning spectroscopic
problems

Platform-ZERO is very pleased to have part of its research published in JOSS, the Journal of Open Source Software.

The paper is led by the coordinator of Platform-ZERO, IREC, the Catalonia Energy Research Institute, and it introduces pudu, a Python library for agnostic feature selection and explainability of Machine Learning spectroscopic problems.

pudu is a Python library that quantifies the effect of changes in spectral features over the predictions of ML models and their effect to the target instances. In other words, it perturbates the features in a predictable and deliberate way and evaluates the features based on how the final prediction changes. For this, four main methods are included and defined. Importance quantifies the relevance of the features according to the changes in the prediction. Thus, this is measured in probability or target value difference for classification or regression problems, respectively. Speed quantifies how fast a prediction changes according to perturbations in the features. For this, the importance is calculated at different perturbation levels, and a line is fitted to the obtained values and the slope, or the rate of change of importance, is extracted as the speed. Synergy indicates how features complement each other in terms of prediction change after perturbations. Finally, re-activations account for the number of unit activations in a Convolutional Neural Network (CNN) that after perturbation, the value goes above the original activation criteria. The latter is only applicable for CNNs, but the rest can be applied to any other ML problem, including CNNs.

Read the full paper at https://doi.org/10.21105/joss.05873