Perform Dimensionality Reduction Using Isomap – OMIQ

In cytometry, visualizing all the markers (i.e features, which can be interpreted as dimensions) can be challenging as cells can express multiple markers at once. Dimensionality reduction is the process of taking high dimensional data and projecting this in low dimensional space while retaining as much information as possible. Dimensionality reduction allows for the visualization of cells that have similar marker expression, normally in a 2D space, by placing closely related cells through marker expression close to each other. This article shows how to set-up an Isomap (Isometric Mapping) in OMIQ.

Isomap embeds high dimensional data to low dimensional data by using the geodetic distances (curved distances) of data points.

Tenenbaum, J., De Silva, V., and Langford, J. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290,2319-2323 (2000). https://doi.org/10.1126/science.290.5500.2319

1. Add an Isomap Task

Click Add new child task and select Isomap from the task selector. In this example, we have subsampled to live cells for our Isomap task.

Your exact workflow branch may look different than the example above. The important thing is that your workflow follows a logical ordering of tasks.

2. Setup the Isomap Task

2.1 Select Files and Features

Select the Files you want to include for your Isomap.

Include all the files that you would want to directly compare in the same Isomap run as each run will create a unique visualization and result.

Select the Features you want to use for the dimensionality reduction.

Each feature you select will affect how the algorithm computes the result. You do not necessarily have to include all features. Often, it will make sense to exclude certain markers if they will not help inform your results (input heterogeneity will equal output heterogeneity).

2.2 Enter Isomap Settings

Feel free to change the default settings for your analysis goal. New to dimensionality reduction? Try out the default settings first and see how changing the hyperparameters below affect your result.

Num Results Component: Determines the number of parameters the Isomap result will generate (isomap_1, isomap_2, isomap_3, etc). 2 Isomap parameters would be considered the most traditional display.

Num Nearest Neighbors: Sets the number of nearest neighbors to consider for each data point.

Max Iter (leave blank for default): Number of iterations used when the arpack is used as the Eigen Solver. This is not used when the Eigen Solver chosen is dense.

Eigen Solver: Sets which Eigen Solver is used to compute for the eigenvalues and eigenvectors during the embedding phase. You can choose auto, arpack, or dense.

Auto: Attempts to choose the most efficient solver to use.
Arpack: Uses the Arnoldi decomposition method to compute for the eigenvalues and eigenvectors. This computes for a few of the top eigenvalues and eigenvectors.
Dense: Uses a direct solver method (LAPACK) to compute for the eigenvalues and eigenvectors. This computes for all eigenvalues and eigenvectors.

Path Method: Determines which method to use in determining the shortest distance between points. You can choose auto, FW, or D.

Auto: Attempts to choose the best algorithm to use.
FW: Uses the Floyd-Warshall algorithm. This uses all data pairs to compute the distance. Ideal for small datasets.
D: Uses the Dijkstra algorithm. This starts from one node and finds the shortest path to the other points based on this node. Ideal for larger datasets.

2.3 Run

Click Run Isomap. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.

3. Review the Results

You can view the results of the Isomap by going to the Results tab and see if the results are as expected.

The single plot above is only for a quick sanity check of results. You shouldn't use it for anything more than that.

For plotting and data visualization, go back out to the workflow and add a Figure as a child of this analysis. To learn more about this, please see our resource Dimension Reduction Visualization.