
Home Spectral Descriptors The Curse of Dimensionality |
||
See also: Increasing Selectivity by Spectral Descriptors
![]() |
||
The Curse of DimensionalityOne particular aspect which is quite important in multivariate analysis is an effect which is called the "curse of dimensionality". If you assume that you are measuring your data with a precision of 10% each variable can be divided into 10 compartments which reflect a significant difference in information. Further assume that we are registering an image of 100 x 100 lateral resolution (= 10000 pixels). So, when looking at a single variable (i.e. the intensities at a particular wavelength) we have 10000 values which distribute over 10 compartments. If we add another variable the same 10000 observations distribute over 100 compartments, a third variable creates 1000 compartments and in general if we use k variables the data spaces consists of 10k compartments.In a typical case (for example, looking at a Raman spectrum with a resolution of 3 wave numbers we get approx. 1100 intensity values) which leads to a 101100-dimensional space. This huge space is populated by 104 observations (pixels) which is near to nothing in comparison to the available space. Of course this crude estimation has to be refined a little bit, since we did not take into account the correlation of the variables within in individual bands of a spectrum. If we assume the width of typical Raman bands being about 40 wave numbers we still end up with about 90 variables leading to a 1090 dimensional space - which is still almost empty due to its hugeness. So what can we do against this curse of dimensionality? We have basically two options: (1) reduce the dimensionality of the space, or (2) increase the number of observations (pixels). As the second option is not feasible for practical reasons (we cannot measure that much of spectra in order to get a non-empty data space, nor we can process that much of data - remember the age of the universe is estimated as roughly 4*1017 seconds). So the only way to improve this unfavorable situation is to reduce the dimensionality. The most common approach to reduce the dimensionality of a data space is to perform variable selection. However, the information we are be interested in, might not be represented by individual variables but by a complex combination of several variables. Thus in many cases variable selection will not deliver the optimum solution (in fact there are situations - especially in mass spectrometry - where variable selection does not provide a solution at all). The solution to this difficult combination of the curse of dimensionality and the impossibility to find an optimum set of variables for a particular problem is the introduction of chemical/physical knowledge. By introducing chemical knowledge we implicitly transform the data space and reduce its dimensionality. How can this be achieved? Read on here.... |
||