In cytometry, visualizing all the markers (i.e features, which can be interpreted as dimensions) can be challenging as cells can express multiple markers at once. Dimensionality reduction is the process of taking high dimensional data and projecting this in low dimensional space while retaining as much information as possible. Dimensionality reduction allows for the visualization of cells that have similar marker expression, normally in a 2D space, by placing closely related cells through marker expression close to each other. This article shows how to set-up a TriMap in OMIQ.
TriMap is a dimensionality reduction technique that helps preserve the overall global structure. TriMap uses triplets instead of pairs to analyze how related data points are to each other.
Amid, E. and Warmuth, M. TriMap: Large-scale Dimensionality Reduction Using Triplets. aRXiV 1910.00204. https://doi.org/10.48550/arXiv.1910.00204
1. Add a TriMap Task
Click Add new child task and select TriMap from the task selector. In this example, we have subsampled to live cells for our TriMap task.
Your exact workflow branch may look different than the example above. The important thing is that your workflow follows a logical ordering of tasks.
2. Setup the TriMap Task
2.1 Select Files and Features
Select the Files you want to include for your TriMap.
Include all the files that you would want to directly compare in the same TriMap run as each run will create a unique visualization and result.
Select the Features you want to use for the dimensionality reduction.
Each feature you select will affect how the algorithm computes the result. You do not necessarily have to include all features. Often, it will make sense to exclude certain markers if they will not help inform your results (input heterogeneity will equal output heterogeneity).
2.2 Enter TriMap Settings
Feel free to change the default settings for your analysis goal. New to dimensionality reduction? Try out the default settings first and see how changing the hyperparameters below affect your result.
Nearest Neighbors: Sets the number of nearest neighbors for the k-nearest neighbors graph used in triplet formation.
Num Outliers: Sets the number of outliers used in triplet formation with nearest neighbors.
Num Output Dimensions: Determines the number of parameters the TriMap result will generate (trimap_1, trimap_2, trimap_3, etc). 2 TriMap parameters would be considered the most traditional display.
Num Random Triplets per Point: Sets the number of random triplets considered per data point.
Weight Adjustment: Determines the gamma value used in log-transformation of the triplets. This helps fine tune the global vs local structure of the resulting embedding.
Distance Metric: Controls how the distance is computed in the ambient space of the input data. You can choose between Euclidean, Manhattan, Angular, and Hamming.
Distance Metrics:
- Euclidean: Measures the straight-line distance between two points in space.
- Manhattan: Computes the sum of absolute differences along each dimension, often reflecting grid-like movement.
- Angular: Considers the angle between vectors.
- Hamming: This is the number of points that are different when two strings of equal length are compared.
Learning Rate: The speed at which TriMap optimizes embedding.
Machine learning algorithms will learn from the data at a specific speed represented by the learning rate. The learning rate determines how the algorithm adjusts its own parameters at each step of the optimization phase. Although we recommend leaving the default automatic learning rate, you can set it manually just by typing the desired number. If the learning rate is set too low or too high, the specific territories for the different cell types won’t be properly separated. A higher learning rate means the algorithm takes bigger steps in each stage of learning but may overshoot the optimal solution. A lower learning rate means that the algorithm takes smaller steps but may result in the process getting stuck and not reaching the optimal solution.
Num Iterations: Sets number of iterations that TriMap will do to optimize the embedding of the data.
Optimization Method: Sets the optimization in the embedding phase of TriMap to minimize triplet violations. You can choose from delta-bar-delta, steepest descent, gradient descent w/momentum.
- Delta-bar-Delta: Adapts the learning rate to account for changes as the embedding proceeds.
- Steepest Descent: Uses a fixed learning rate and follows the steepest change in the loss of function in the embedding phase.
- Gradient Descent w/ Momentum: Uses a fixed learning rate and adjust the descent based on previous gradients in accounting for the loss of function in the embedding phase.
KNN Method: Determines the k-nearest neighbors method used in TriMap. You can choose from HNSW or ANNOY.
- HNSW (Hierarchical Navigable Small World) approximates the nearest neighbors and builds a multi-layer graph structure. Choose this for high dimensional, large datasets.
- ANNOY (Approximate Nearest Neighbor Oh Yeah) is an algorithm that approximates nearest neighbors through through a tree-based index that allows for quick searchers in high dimensional space.
2.3 Run
Click Run TriMap. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.
3. Review the Results
You can view the results of the TriMap by going to the Results tab and see if the results are as expected.
The single plot above is only for a quick sanity check of results. You shouldn't use it for anything more than that.
For plotting and data visualization, go back out to the workflow and add a Figure as a child of this analysis. To learn more about this, please see our resource Dimension Reduction Visualization.