Perform Dimensionality Reduction Using FIt-SNE – OMIQ

In cytometry, visualizing all the markers (i.e features, which can be interpreted as dimensions) can be challenging as cells can express multiple markers at once. Dimensionality reduction is the process of taking high dimensional data and projecting this in low dimensional space while retaining as much information as possible. Dimensionality reduction allows for the visualization of cells that have similar marker expression, normally in a 2D space, by placing closely related cells through marker expression close to each other. This article shows how to set-up a FIt-SNE (Fast Fourier Transform-accelerated Interpolation-based t-SNE) in OMIQ.

FIt-SNE uses the Fast Fourier Transformation that computes the relationship of each data point to each other using an equispaced grid allowing for faster projections of high-dimensional data into a two dimensional space. Because FIt-SNE uses an equispaced grid large datasets can be handled more effectively.

Linderman, G.C., Rachh, M., Hoskins, J.G.et al. Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat Methods 16, 243–245 (2019). https://doi.org/10.1038/s41592-018-0308-4

1. Add a FIt-SNE Task

Click Add new child task and select FIt-SNE from the task selector. In this example, we have subsampled to live cells for our FIt-SNE task.

Your exact workflow branch may look different than the example above. The important thing is that your workflow follows a logical ordering of tasks.

2. Setup the FIt-SNE Task

2.1 Select Files and Features

Select the Files you want to include for your FIt-SNE.

Include all the files that you would want to directly compare in the same FIt-SNE run as each run will create a unique visualization and result.

Select the Features you want to use for the dimensionality reduction.

Each feature you select will affect how the algorithm computes the result. You do not necessarily have to include all features. Often, it will make sense to exclude certain markers if they will not help inform your results (input heterogeneity will equal output heterogeneity).

2.2 Enter FIt-SNE Settings

Feel free to change the default settings for your analysis goal. New to dimensionality reduction? Try out the default settings first and see how changing the hyperparameters below affect your result.

You can configure FIt-SNE with the optimized parameters of opt-SNE. To learn how to do this, please see our article How to Configure FIt-SNE With opt-SNE Style Parameters.

Number of dimensions: Determines the number of parameters the FIt-SNE result will generate (fitsne_1, fitsne_2, fitsne_3, etc). 2 FIt-SNE parameters would be considered the most traditional display.

Perplexity: Though distinct, this will have a similar impact as editing the nearest neighbors for each data point. The algorithm uses the perplexity to calculate how similar data points are in the high dimensional space before projecting it to a low dimensional space. Larger datasets may need to have a higher perplexity. Low perplexity focuses on local structure (e.g. CCR7 levels between CD4+ T cells) while high perplexity focuses on global structure (e.g. B cells compared to monocytes).

Theta: Theta controls how similar the Fast Fourier Transformation implementation is to the original t-SNE algorithm (a lower value means it is more similar).

Random Seed: A number that is used to initialize the FIt-SNE operation. This is optional to change. FIt-SNE is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and settings are used, by retaining the same Random Seed value, the same result will be achieved.

Max Iterations: This is the maximum number of iterations that FIt-SNE will do to embed the data into low dimensional space.

Stop Early Exaggeration: The number of iterations that determines when early exaggeration phase will stop.

Learning Rate: The speed at which FIt-SNE optimizes embedding. Machine learning algorithms will learn from the data at a specific speed represented by the learning rate.

Machine learning algorithms will learn from the data at a specific speed represented by the learning rate. The learning rate determines how the algorithm adjusts its own parameters at each step of the optimization phase. Although we recommend leaving the default automatic learning rate, you can set it manually just by typing the desired number. If the learning rate is set too low or too high, the specific territories for the different cell types won’t be properly separated. A higher learning rate means the algorithm takes bigger steps in each stage of learning but may overshoot the optimal solution. A lower learning rate means that the algorithm takes smaller steps but may result in the process getting stuck and not reaching the optimal solution.

Stop Early Exaggeration Factor: The factor used in the early exaggeration phase. Early exaggeration increases the attractive forces between similar data points. This, in turn, improves the convergence of the data and helps create the separation between clusters.

Use approximate (Annoy) method for nearest neighbors: If selected, FIt-SNE uses the ANNOY (Approximate Nearest Neighbor Oh Yeah) method to determine nearest neighbors. If not selected FIt-SNE uses the VP-Trees method to determine nearest neighbors.

2.3 Run

Click Run FIt-SNE. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.

3. Review the Results

You can view the results of the FIt-SNE by going to the Results tab and see if the results are as expected.

The single plot above is only for a quick sanity check of results. You shouldn't use it for anything more than that.

For plotting and data visualization, go back out to the workflow and add a Figure as a child of this analysis. To learn more about this, please see our resource Dimension Reduction Visualization.