Clustering Using PhenoGraph – OMIQ

PhenoGraph is a way to cluster high dimensional single cell data through using graphs (networks) of interconnected cells by analyzing phenotypic similarities. PhenoGraph then detects communities within these graphs.

Levine, J.H., Simonds, E.F., Bendall, S.C., et al. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell. 162(1):184-97 (2015). https://doi.org/10.1016/j.cell.2015.05.047

1. Add a PhenoGraph Task

Click Add new child task. Select PhenoGraph from the task selector.

2. Set-up PhenoGraph

2.1 Select Files and Features

Select the Files you want to include in the clustering. Select the Features you want to base the clustering method on.

2.2 Set PhenoGraph Settings

K Nearest Neighbors: The number of nearest neighbors the algorithm considers in the initial graph construction phase.

Nearest Neighbors Algorithm: Choose the algorithm to use in searching for nearest neighbors. There are currently two available algorithms.

Annoy: ANNOY (Approximate Nearest Neighbor Oh Yeah) is an algorithm that approximates nearest neighbors through through a tree-based index that allows for quick searchers in high dimensional space. You can choose between Euclidean, Manhattan, or Cosine distance metric to use for Annoy.
HNSW: HNSW (Hierarchical Navigable Small World) approximates the nearest neighbors and builds a multi-layer graph structure. Choose this for high dimensional, large datasets. You can choose between Euclidean or Cosine distance metric to use for the HNSW algorithm.
k-d tree: k-d tree (k-dimensional tree) produces a binary tree structure that partitions data points and then organizes these points in k-dimensional space. Efficient for low dimensional space (<20 dimensions). You can choose between Euclidean, Manhattan, Cosine, or Correlation distance metric to use for the k-d tree algorithm.

The HNSW is optimized for high dimensional data, providing faster search times with large datasets. However, it is not deterministic, thus results will NOT be reproducible between runs even with the same random seed.

Distance Metrics:

Euclidean: Measures the straight-line distance between two points in space.
Cosine: Assesses the angle between two vectors, focusing on their directional similarity.
Manhattan: Computes the sum of absolute differences along each dimension, often reflecting grid-like movement.
Correlation: Quantifies how much two points change together, highlighting linear relationships.

Clustering method: Choose the clustering method to be used in PhenoGraph.

Louvain: This is the original clustering method used in PhenoGraph. Louvain is a two-step (initial node moving step and a graph aggregation step) community detection algorithm.
Leiden: Leiden clustering is a community detection algorithm that has a refinement step in between the initial node moving step and a graph aggregation step.

To learn more about the Leiden algorithm, read the paper:

Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z

Seed (optional): A number that is used to initialize the PhenoGraph operation. This is optional to change. However, to make it reproducible, a fixed Seed may be set. If the same dataset and settings are used, by retaining the same Seed value, the same result will be achieved.

2.3 Run

Click Run PhenoGraph. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.

3. Review your Results

Go to the Results tab to view the results.

To use the results in your workflow, you need to convert them to filters.

To learn more about creating categorical filters, please see our article: Create Categorical Filters. You can also see our resources How to Use Clustering.