In cytometry, visualizing all the markers (i.e features, which can be interpreted as dimensions) can be challenging as cells can express multiple markers at once. Dimensionality reduction is the process of taking high dimensional data and projecting this in low dimensional space while retaining as much information as possible. Dimensionality reduction allows for the visualization of cells that have similar marker expression, normally in a 2D space, by placing closely related cells through marker expression close to each other. This article shows how to set-up an opt-SNE (optimized t-SNE) in OMIQ.
opt-SNE is a variant of the t-SNE algorithm that features several improvements on top of the traditional Barnes-Hut implementation of t-SNE, including the ability to detect the rate of improvement of Kullback-Leibler Divergence (KLD - functionally, how good the low dimensional projection of many dimensions is) and then automatically stop the algorithm when it begins to suffer from diminishing returns in that metric.
Belkina, A.C., Ciccolella, C.O., Anno, R.et al. Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat Commun 10, 5415 (2019). https://doi.org/10.1038/s41467-019-13055-y
1. Add an opt-SNE Task
Click Add new child task and select opt-SNE from the task selector. In this example, we have subsampled to live cells for our opt-SNE task.
Your exact workflow branch may look different than the example above. The important thing is that your workflow follows a logical ordering of tasks.
2. Setup the opt-SNE task
2.1 Select Files and Features
Select the Files you want to include for your opt-SNE.
Include all the files that you would want to directly compare in the same opt-SNE run as each run will create a unique visualization and result.
Select the Features you want to use for the dimensionality reduction.
Each feature you select will affect how the algorithm computes the result. You do not necessarily have to include all features. Often, it will make sense to exclude certain markers if they will not help inform your results (input heterogeneity will equal output heterogeneity).
2.2 Enter opt-SNE Settings
Feel free to change the default settings for your analysis goal. New to dimensionality reduction? Try out the default settings first and see how changing the hyperparameters below affect your result.
Max Iterations: This is the maximum number of iterations that opt-SNE will do to optimize the embedding of the data. opt-SNE will, however, automatically stop when the ideal embedding has been reached.
opt-SNE End: This is the constant used to stop the run. Larger numbers result in longer runs before stopping.
Perplexity: Though distinct, this will have a similar impact as editing the nearest neighbors for each data point. The algorithm uses the perplexity to calculate how similar data points are in the high dimensional space before projecting it to a low dimensional space. Larger datasets may need to have a higher perplexity. Low perplexity focuses on local structure (e.g. CCR7 levels between CD4+ T cells) while high perplexity focuses on global structure (e.g. B cells compared to monocytes).
Theta: Theta controls how similar the Barnes-Hut implementation of t-SNE is to the original t-SNE algorithm (a lower value means it is more similar).
The Barnes-Hut implementation of t-SNE was created to allow the algorithm to be used on larger datasets (with more than a few thousand events total) with faster run times, so decreasing it is generally never recommended. Changing the value of theta is recommended only in the very rare case where your opt-SNE runs are failing or canceling due to memory limitations with large numbers of events, channels, iterations, or perplexity. Please note that changing the value of theta may result in groups of events or observations being separated on the map that don’t have meaningful differences in marker expression.
Components: Determines the number of parameters the opt-SNE result will generate (optsne_1, optsne_2, optsne_3, etc). 2 opt-SNE parameters would be considered the most traditional display.
Random Seed: A number that is used to initialize the opt-SNE operation. This is optional to change. opt-SNE is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and settings are used, by retaining the same Random Seed value, the same result will be achieved.
Verbosity: The frequency of iterations to print the algorithm progress. If this value is 0, there will be no printed output of the progress, 1 means that progress is displayed for every iteration, 25 means that progress is displayed every 25th iteration, and so on.
Pre-init Embedding X (optional) and Pre-init Embedding Y (optional): Allows you to choose features as a Pre-init Embedding X and Pre-init Embedding Y that will be used as the initial embedding positions. If you have done a dimensionality reduction on a reference dataset (for example, a PCA before your opt-SNE), select the results as features then add them as the Pre-init Embedding X and Pre-init Embedding Y here. This will use the previous dimensionality reduction as a scaffold to embed the data points of this opt-SNE run.
The combination of PCA and opt-SNE has been referenced as combining a global and local overview of your data. You can try it out for yourself and see how this affects your result.
2.3 Run
Click Run opt-SNE. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.
3. Review the Results
You can view the results of the opt-SNE by going to the Results tab and see if the results are as expected.
The single plot above is only for a quick sanity check of results. You shouldn't use it for anything more than that.
For plotting and data visualization, go back out to the workflow and add a Figure as a child of this analysis. To learn more about this, please see our resource Dimension Reduction Visualization.