Filtering of descriptors
From QSPR Wiki
Before descriptors are passed to the machine-learning algorythms, they can be filtered by various criteria.
Basic filtering parameters
User can select to:
- Eliminate descriptors having less than a particular number (by default 2) unique values
- Merge the correlated pairs of descriptors, having the correlation coefficient more than a threshold (by default 0.95)
Principal component analysis (PCA)
The filtering by this criteria involve the calculation of the principal components. Only the PCA components will be used for development of models and, moreover, only those that have the standard deviation not less than a predefined threshold (by default 0.95).
This filter can significantly decrease the number of the used variables for model, in particular for inter-correlated descriptors.