In cytometry, visualizing all the markers (i.e features, which can be interpreted as dimensions) can be challenging as cells can express multiple markers at once. Dimensionality reduction is the process of taking high dimensional data and projecting this in low dimensional space while retaining as much information as possible. Dimensionality reduction allows for the visualization of cells that have similar marker expression, normally in a 2D space, by placing closely related cells through marker expression close to each other. This article shows how to set-up a PCA (Principal Component Analysis) in OMIQ.
PCA is one of the oldest dimensionality reduction tools available. PCA performs a linear dimensionality reduction to find the principal components. It uses the Singular Value Decomposition to project high dimensional data to a lower dimensional space.
Tipping, M.E., and Bishop, C.M. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(3), 611-622 (1999). https://doi.org/10.1111/1467-9868.00196
1. Add a PCA Task
Click Add new child task and select PCA from the task selector. In this example, we have subsampled to live cells for our PCA task.
Your exact workflow branch may look different than the example above. The important thing is that your workflow follows a logical ordering of tasks.
2. Setup the PCA Task
2.1 Select Files and Features
Select the Files you want to include for your PCA.
Include all the files that you would want to directly compare in the same PCA run as each run will create a unique visualization and result.
Select the Features you want to use for the dimensionality reduction.
Each feature you select will affect how the algorithm computes the result. You do not necessarily have to include all features. Often, it will make sense to exclude certain markers if they will not help inform your results (input heterogeneity will equal output heterogeneity).
2.2 Enter PCA Settings
Num Result Components: Determines the number of parameters the PCA result will generate (pca_1, pca_2, pca_3, etc).
Method: This chooses how the PCA calculations are performed. You can choose between Randomized and Exact.
- Randomized: Runs a randomized Singular Value Decomposition calculation.
- Exact: Runs a full Singular Value Decomposition calculation.
The randomized method is based on the paper by Halko et al. See associated papers below:
Halko, N., Martinsson, P.G., and Tropp, J.A. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM Review 53(2), 217-288 (2011). https://doi.org/10.1137/090771806
Martinsson, P.G., Rokhlin, V., and Tygert, M. A Randomized Algorithm for the Decomposition of Matrices. Applied and Computational Harmonic Analysis 30(1), 47-68. https://doi.org/10.1016/j.acha.2010.02.003
Random Seed: A number that is used to initialize the PCA operation. This is optional to change. The PCA algorithm is stochastic. To make it reproducible, a fixed Seed may be set. If the same dataset and settings are used, by retaining the same Random Seed value, the same result will be achieved.
2.3 Run
Click Run PCA. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.
3. Review the Results
You can view the results of the PCA by going to the Results tab and see if the results are as expected.
The single plot above is only for a quick sanity check of results. You shouldn't use it for anything more than that.
For plotting and data visualization, go back out to the workflow and add a Figure as a child of this analysis. To learn more about this, please see our resource Dimension Reduction Visualization.