4 minutes
Sammon Mapping Using Steepest-Descent Iterative Approach
Sammon mapping (SM), an algorithm for dimensionality reduction, follows non-linear projection based mapping of N-dimensional data to a lower dimension.
Why projecting?
Projection is one of the major step used during visualization and data analysis. We can easily picture data in 2D as well as 3D but anything higher than these dimensions, it is difficult for us to visualize. In such a case, we need to project whole datasets to a lower dimension (i.e 2D or 3D) by preserving their properties in order to analyze them. Some linear projection algorithms are used frequently for such cases.
We all have heard about PCA (Principal Component Analysis), if not then, PCA follows static orthogonal projection to get linearly uncorrelated points in lesser dimension. However, there are certain limitations of PCA which can be eliminated using the SM algorithm. Although PCA maximizes the original variance present in the transformed dataset, yet it shows problem while projecting some structured regular pattern in a curved manifold.
For example in figure 1, I tried to plot a bouquet of circles in 6d, each circle perpendicular to each other using PCA. As you can see, PCA overlaps each circle’s coordinate and converts it to a line. It didn’t preserve the structure of input datasets. As a result, we get:
Fig 1. Projection of Bouquet of Circles passing through the origin, where each circle is perpendicular to each other, using PCA.
So, Sammon mapping tries to eliminate such limitation.
What it does is that it takes the result of PCA with maximum variance and adds other features( what feature? explained later) to minimize the transformation of properties of data set. Subsequently, we have to remember a point that Sammon mapping does not give the complete solution but it rather tries to minimize the error and provides the optimal result.
It tries to minimize the difference between the distance of data-points with each other in transformed space and original space. Hence, It preserves the structure of the datasets as a whole.
The algorithm of Sammon mapping follows projecting datasets by using PCA and minimizing the projection error by backpropagating and changing transformed dataset.
Projection Error
The error is mainly defined as the difference between the overall distance between data-points on original space with transformed space. Here is the rough sketch of the algorithm in some mathematical structure:
Let us assume that we have the dataset V with ’n’ numbers of data-points in ‘p’ dimension.
So, V will be a matrix of dimension *n ** p
And,* V’* with ’n’ numbers of data-points in *‘q’ *dimension. Therefore, *V’* will be a matrix of dimension *n ** *q*
We will now project data from ‘p’ dimension to ‘q’ dimension.
Projection Error is given as :
Projection Error Minimization
Error minimization can be done through the various process. Whereas, I have tried the Gradient Descent iterative approach.
Which goes like:
- Calculate Error
1.1 Loop while error_old — error >threshold
2 For each data point, V
2.1 Calculate the first difference of error w.r.t a data point
2.2 Calculate the second differentiation of error w.r.t a data point
2.3 Calculate the delta given by,* delta = (first/second) * Magic Factor,*
Magic Factor can be either 0.3 or 0.4. This value is determined empirically. It controls how much a vector is adjusted with respect to error. It improves the convergence since we are taking the small step downwards.
2.4 V = V — delta
2.5 Update the error calculated
The formula for first differentiation with respect to a data point Vjp is :
The formula for second differentiation with respect to a data point Vjp is :
where dij = Euclidean distance between two data-point Vi and Vj in original space.
d’ij = Euclidean distance between two data-point V’i and* V’j* in reduced space.
After few iterations, Sammon Mapping algorithm will give the reduced vector.
The result is shown below:
Fig 2: Projection of Bouquet of Circles passing through the origin, where each circle is perpendicular to each other using Sammon Mapping.Here’s the GitHub link of the code I implemented and the datasets for Bouquet of Circle.
https://github.com/sarmilaupadhyaya/Sammon_Mapping-python3
References
[1]https://ieeexplore.ieee.org/document/1671271/references#references
[2] http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0910
[3]https://en.wikipedia.org/wiki/Sammon_mapping
[4]https://www.codeproject.com/Articles/43123/Sammon-Projection