In cytometry, visualizing all the markers (i.e features, which can be interpreted as dimensions) can be challenging as cells can express multiple markers at once. Dimensionality reduction is the process of taking high dimensional data and projecting this in low dimensional space while retaining as much information as possible. Dimensionality reduction allows for the visualization of cells that have similar marker expression, normally in a 2D space, by placing closely related cells through marker expression close to each other. This article shows how to set-up an PHATE (Potential of Heat-diffusion for Affinity-based Trajectory Embedding) in OMIQ.
PHATE is an algorithm that preserves both the global and local structure of high dimensional data. PHATE uses the pairwise distances to determine local structure and uses a heat-diffusion process to analyze overall global structure. These are then embedded into low dimensions using the multidimensional scaling (MDS) to visualize the data.
Moon, K., van Dijk, D., Wang, Z., et. al. Visualizing Structure and Transitions in High-Dimensional Biological Data. Nature Biotechnology 37, 1482-1492 (2019). https://doi.org/10.1038/s41587-019-0336-3
1. Add a PHATE Task
Click Add new child task and select PHATE from the task selector. In this example, we have subsampled to live cells for our PHATE task.
Your exact workflow branch may look different than the example above. The important thing is that your workflow follows a logical ordering of tasks.
2. Setup the PHATE Task
2.1 Select Files and Features
Select the Files you want to include for your PHATE.
Include all the files that you would want to directly compare in the same PHATE run as each run will create a unique visualization and result.
Select the Features you want to use for the dimensionality reduction.
Each feature you select will affect how the algorithm computes the result. You do not necessarily have to include all features. Often, it will make sense to exclude certain markers if they will not help inform your results (input heterogeneity will equal output heterogeneity).
2.2 Enter PHATE Settings
Feel free to change the default settings for your analysis goal. New to dimensionality reduction? Try out the default settings first and see how changing the hyperparameters below affect your result.
Number of Components: Determines the number of parameters the PHATE result will generate (phate_1, phate_2, phate_3, etc). 2 PHATE parameters would be considered the most traditional display.
Number of Nearest Neighbors: Sets the number of nearest neighbors that PHATE will use to build the initial kernel.
Increasing the number of nearest neighbors may help if the clusters appear disconnected from each other.
KNN Distance Metric: Determines what is used to compute the distance of nearest neighbors. You can choose between Euclidean, Manhattan, Chebyshev, Cosine, and Canberra.
Distance Metrics:
- Euclidean: Measures the straight-line distance between two points in space.
- Manhattan: Computes the sum of absolute differences along each dimension, often reflecting grid-like movement.
- Chebyshev: Measures the farthest distance between two points along any dimension.
- Cosine: Considers the angle between vectors.
- Canberra: Computes the distance between two points by measuring the differences of these points in the multidimensional space.
Alpha Decay: Sets the decay rate of the affinity matrix created from the k-nearest neighbors kernel. This is used to determine the diffusion probability of data points within the affinity matrix.
It is rare that the default needs to be changed. Increasing the alpha decreases the connectivity of the points.
Smoothing Multiplier (t): Determines the level of smoothing of the diffusion process within the affinity matrix. Leave this blank to set the algorithm to automatically determine the t using the Von Neumann Entropy.
Random Seed: A number that is used to initialize the PHATE operation. This is optional to change. The PHATE algorithm is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and settings are used, by retaining the same Random Seed value, the same result will be achieved.
Number of Landmarks for fast PHATE (optional): Sets the number of points that will be used as representatives (landmarks) when running fast PHATE.
Gamma: Provides the informational distance constant used by PHATE to stabilize the connections within the affinity matrix. By default, this is 1. You can choose a value between -1 and 1.
Distance Metric for MDS: Determines what distance metric is used in the embedding phase of PHATE. You can choose between Euclidean and Cosine.
MDS Algorithm: Determines what MDS algorithm is used by PHATE for low dimension embedding. You can choose between metric, classical, and non-metric.
- Metric: Embeds the data points to low dimension by using the preserved approximate metric distances. This is the default of PHATE.
- Classical: Embeds the data points to low dimension by using exact Euclidean distances.
- Non-metric: Embeds the data points to low dimension by using the rank order rather than metric distances.
MDS Solver: Sets which solver is used for a metric MDS. Only available if metric is chosen as the MDS algorithm. You can choose between sgd and smacof.
- SGD: Stochastic Gradient Descent - minimizes the stress function by approximating the gradient of descent by using random points rather than the entire dataset at each iteration.
- SMACOF: Scaling by Majorizing a Complicated Function - minimizes the stress function by using the entire dataset at each iteration.
2.3 Run
Click Run PHATE. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.
3. Review the Results
You can view the results of the PHATE by going to the Results tab and see if the results are as expected.
The single plot above is only for a quick sanity check of results. You shouldn't use it for anything more than that.
For plotting and data visualization, go back out to the workflow and add a Figure as a child of this analysis. To learn more about this, please see our resource Dimension Reduction Visualization.