PhenoGraph is a way to cluster high dimensional single cell data through using graphs (networks) of interconnected cells by analyzing phenotypic similarities. PhenoGraph then detects communities within these graphs.
Levine, J.H., Simonds, E.F., Bendall, S.C., et al. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell. 162(1):184-97 (2015). https://doi.org/10.1016/j.cell.2015.05.047
1. Add a PhenoGraph Task

Click Add new child task. Select PhenoGraph from the task selector.
2. Set-up PhenoGraph
2.1 Select Files and Features
Select the Files you want to include in the clustering. Select the Features you want to base the clustering method on.
2.2 Set PhenoGraph Settings
K Nearest Neighbors: The number of nearest neighbors the algorithm considers in the initial graph construction phase.
Nearest Neighbors Algorithm: Choose the algorithm to use in searching for nearest neighbors. There are currently two available algorithms.
-
HNSW: HNSW (Hierarchical Navigable Small World) approximates the nearest neighbors and builds a multi-layer graph structure. Choose this for high dimensional, large datasets. You can choose between Euclidean, Cosine, or Inner Product distance metric to use for the HNSW algorithm.
- k-d tree: k-d tree (k-dimensional tree) produces a binary tree structure that partitions data points and then organizes these points in k-dimensional space. Efficient for low dimensional space (<20 dimensions). You can choose between Euclidean, Manhattan, Cosine, or Correlation distance metric to use for the k-d tree algorithm.
The HNSW is optimized for high dimensional data, providing faster search times with large datasets. However, it is not deterministic, thus results will NOT be reproducible between runs even with the same random seed.
Distance Metrics:
- Euclidean: Measures the straight-line distance between two points in space.
- Cosine: Assesses the angle between two vectors, focusing on their directional similarity.
- Inner Product: Measures the overall similarity based on the magnitude and direction of the vectors, often emphasizing scale and alignment.
- Manhattan: Computes the sum of absolute differences along each dimension, often reflecting grid-like movement.
- Correlation: Quantifies how much two points change together, highlighting linear relationships.
Clustering method: Choose the clustering method to be used in PhenoGraph.
-
Louvain: This is the original clustering method used in PhenoGraph.
- Leiden: This is an improved version of the Louvain algorithm, often offering better partition quality and faster convergence.
The Louvain algorithm may find badly connected communities. The improvements contained in the Leiden algorithm address this issue. To learn more, read the paper:
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Seed (optional): A number that is used to initialize the PhenoGraph operation. This is optional to change. However, to make it reproducible, a fixed Seed may be set. If the same dataset and settings are used, by retaining the same Seed value, the same result will be achieved.
2.3 Run
Click Run PhenoGraph. This will take you to the Status tab and you can watch the progress. However, you are free to go back to your workflow or do whatever you please while this runs in the cloud. The status can also be seen in the Workflow itself, or you can have an email sent to you when it is completed.
3. Review your Results

Go to the Results tab to view the results.
To use the results in your workflow, you need to convert them to filters.
To learn more about creating categorical filters, please see our article: Create Categorical Filters. You can also see our resources How to Use Clustering.