Filtering of descriptors

From QSPR Wiki

Jump to:navigation, search

Before descriptors are passed to the machine-learning algorythms, they can be filtered by various criteria.

Basic filtering parameters

User can select to:

Principal component analysis (PCA)

The filtering by this criteria involve the calculation of the principal components. Only the PCA components will be used for development of models and, moreover, only those that have the standard deviation not less than a predefined threshold (by default 0.95).

This filter can significantly decrease the number of the used variables for model, in particular for inter-correlated descriptors.