This is work in progress... (c) 2021 Z. Gajarska and H. Lohninger



Standardization of Spectra

How do you do this in ImageLab? Click the ImageLab logo to get the instructions.
A simple and frequently used form of data preprocessing is the standardization of spectra. Its primary purpose is to reduce scattering effects (see also MSC and EMSC), but also to prepare for methods that require a well-defined and comparable range of independent variables.

This is often referred to in the literature as "standard normal variate" (SNV). For each spectrum of an image vi,j the respective mean value mi,j is subtracted from the spectrum and the difference is divided by the standard deviation si,j. This scales each spectrum so that its mean value is equal to zero and the standard deviation is equal to one:

vi,j,k = (vi,j,k - mi,j) / si,j

with

i,j ... lateral indices (pixel coordinates)
k ... wavelength index

The following is an example that shows the effects of standardizing spectra: five spectra in a hyperspectral image were selected at various spots of an apple:

The corresponding spectra differ considerably as the intensities depend not only on the illumination but also on the scattering. Plotting the 5 spectra on top of one another shows this very clearly:

If these spectra are standardized so that each spectrum has a mean value of zero and a standard deviation of 1, the spectra fit together much better.

Left: The image of an apple at 630 nm. One sees the reflection of the light source and barely any surface structure. Right: After standardizing the spectra the reflections are significantly reduced and the marbling of the apple becomes visible.

The standardization of spectra can make spectra much more similar and reduce illumination effects – which can be helpful in exploratory data analysis and cluster analysis. However, one must be aware that standardization also causes information to get lost, such that differences between adjacent areas in an image may not be so clear anymore. Furthermore, the calculation of a calibration model with this scaling becomes practically impossible (unless one has an internal standard).

The following example shows which serious consequences a fitting respectively an erring scaling has. The same data was subjected to k-Means Clustering. Above is the result without any scaling, below is the result with a previous standardization of the spectra. It can clearly be seen that without the standardization of the spectra the cluster analysis performs the assignment of the pixels primarily according to brightness of the areas, while in standardized spectra the assignment actually results according to the color of the fruits respectively parts of the fruits. The standardization largely removes the differences in brightness, so that the clustering algorithm can recognize the color differences.

Top: the result of kMeans (k=8) for non-standardized data. Middle: The same for standardized data. Bottom: The photo of the fruit basket. Data source: T. Skauli and J. Farrell: A collection of hyperspectral images for imaging systems research. In Proc. SPIE 8660, Digital Photography IX, 86600C (February 4, 2013)