
Home Image Analysis Classification Introduction to Classification |
||||||||||||||||
![]() |
||||||||||||||||
Introduction to ClassificationThe goal of classification is always to get a statement about the membership of an unknown substance (represented by some physical properties, i.e. a spectrum) to a particular class (defined by the user). In the best case this assignment of membership is accompanied by the probability of belonging to this class. As the raw data is normally not well-suited for establishing a classifier, we need to do some pre-processing steps before we can start the creation of a classifier. In general, the creation of a classifier is usually done along the following lines:
Pre-ProcessingIn most cases raw data are not well suited to the immediate use for classification purposes. Depending on the type of spectra and the selection of the classification model the data maybe has to be pre-processed. Among the many pre-processing methods standardization, baseline correction, smoothing and noise reduction are among the most popular procedures.
Augmenting the Feature SpaceInformation which can be used to classify samples can be distributed over a large number of variables of the input data space, or it may be even encoded indirectly by hidden patterns in the input vectors. Classification methods do not perform well in such situations. As it has been shown previously [Lohninger 1987], the performance of classifiers can be significantly increased by adding additional features which encode certain properties of the raw data in extra variables appended to the input data space X'. This augmented feature space Xa is then subjected to the feature selection process which takes place in a subsequent step.The introduction of additional features should be governed by spectroscopic and theoretic knowledge. So, for example, if we know the spectral pattern of a certain species we could introduce a new variable which simply represents the correlation of the sample spectrum to the spectral pattern of the particular species. Thus we introduce a new variable which is sensitive to this species and which might help to discriminate this species from other substances. In contrast to the knowledge-based approach the augmentation of the feature space can be driven to its limits by algorithmically generating a huge set of features followed by FFX (fast function extraction) [Lit.]. Another approach is the use of deep learning approaches to generate new features [Helin 2021] Feature selectionThe selection of features has a two-fold purpose: (1) the reduction of the dimensionality of the input data space keeps the training and application times low, and (2) the selection of the features allows to chose a subspace which eases the task of discriminating classes. The second aspect can be especially important if you want to use linear classifiers.... to be completed ... and dimensionality reduction and selectivity enhancement Training of the ClassifierThe training of a classifier specifically depends on the algorithm used for classification. Each algorithm uses a set of parameters which control the classification model. This can be simple weights (as in the case of PLS-based discriminant analysis) or a mixture of parameters having their own specific meaning (as, for example, in the case of Gaussian Process classifiers).Thus the training of a classifier can be seen as parametrization of an algorithm such, that the algorithm is able to correctly assign a spectrum to a binary class property. In general terms, we have a p-dimensional space Rp on the input and a q-dimensional space Sq on the output or target side. The dimensionality of the input space (p) is typically between 10 and 10000, the dimensionality of the target space (q) is 1 for a two-class discrimination and nc for a multi-class problem (with nc being the number of classes to be distinguished, see below for details). In order to establish a classifier we have to use a set of observations with known and correct class assignments which fill (in a geometric sense) both the R and the S space. This set of correct observations is commonly called the "ground truth":
The ground truth then has to be split into two or three non-overlapping subsets, which serve different purposes: the training set, the test set and - sometimes - a monitoring set. The training set is - as the name implies - used to parametrize the classifier. The test set is used for the evaluation of the classifier performance and the monitoring dataset is used with some classification methods to monitor the progress of the training (as, for example, with ANNs). Please note that quality of the ground truth dataset is of utmost importance. If the training dataset contains incorrect class assignments or problematic spectra the resulting classifier will be inferior. However, there is a simple means to check the ground truth dataset if the number of incorrect samples is low (given that the classification problem can be solved by the chosen classifier model): apply full cross validation and compare the results of the estimated classes with the classes recorded in the ground truth dataset. Those observations whose estimated class value differ the most are likely to be candidates for re-inspection of the ground truth.
Now the art of creating a classifier lies in the parametrization of the selected classification model. --_> parameters and hyper parameters
What to do in a multi-class case?
--> indicator variables
--> OVA and similar decision rules
... see hyperspectral -> Coimbra
... to be completed What are hyperparameters: hyperparameters are parameters of the classification model which are not found by the training procedure but which nevertheless affect the model performance. Hyperparameters can be divided into two groups: on the one hand hyperparameters might influence the quality and speed of the learning process but have no effect on the resulting classifier, and on the other hand hyperparameters can influence the trained classifier as well. An example for hyperparameters is the number of hidden neurons in ANNs, or the learning rate of the training algorithm. The first greatly influences the resulting classifier while the latter does not or to almost no extent. In general hyperparameters make the creation of a classifier considerably more time consuming because the optimal hyperparameters can only be determined by scanning the hyperparameter space, repeating the training for each set of hyperparameters. EvaluationIn order to evaluate a classifier we have to apply the classifier to a subset of the ground truth dataset ("test set") which is not overlapping with the training set. The application of the classifier results in estimated class assignments Y-hat which have to be compared to the Y values of the test set. From this comparison a so called confusion matrix can be calculated, which contains the counts of all four possible outcomes of the comparison (false positive, false negative, true positive and true negative).
|
||||||||||||||||