Overlap Module

The overlap module provides functions for computing PLASTRO overlap scores that quantify the relationship between lineage and phenotypic distances.

Core Functions

PLASTRO_score(character_matrix, ad[, ...])

Compute the PLASTRO overlap plasticity score.

PLASTRO_overlaps(character_matrix, ad[, ...])

Compute overlaps between lineage and phenotypic neighborhoods.

Utility Functions

compute_lineage_distances(character_matrix)

Compute pairwise lineage distances from character matrix.

compute_phenotype_distances(ad, latent_space_key)

Compute pairwise phenotypic distances from latent space.

gini_index(arr)

Compute the Gini inequality index for a 1D array.

Function Details

Overlap analysis module for PLASTRO scores.

This module provides functions for computing overlap-based plasticity scores from character matrices and single-cell data. The PLASTRO score quantifies the relationship between lineage and phenotypic distances.

plastro.overlap.PLASTRO_score(character_matrix: DataFrame, ad: AnnData, threshold: float = 0.95, maximum_radius: int = 500, interval: int = 1, latent_space_key: str = 'X_dm', flavor: str = 'gini', parallel: bool = False, save_to: str | None = None, show_plots: bool = True) DataFrame[source]

Compute the PLASTRO overlap plasticity score.

The PLASTRO score quantifies cellular plasticity by measuring the overlap between lineage relationships (from character matrix) and phenotypic relationships (from latent space) at multiple spatial scales.

Parameters:
  • character_matrix (pd.DataFrame) – Character matrix with cells as rows and CRISPR mutation sites as columns. Values represent mutation states (0=unmutated, >0=mutated states).

  • ad (anndata.AnnData) – Annotated data object containing latent representation of phenotype.

  • threshold (float, optional) – Threshold for variance in overlap (proportion of max peak variance), by default 0.95. Only used when flavor=’variable_radii’. Radii with variance >= threshold * max_variance are used for computing the final score.

  • maximum_radius (int, optional) – Maximum radius for computing overlap, by default 500.

  • interval (int, optional) – Interval between radii for overlap computation, by default 1.

  • latent_space_key (str, optional) – Key in ad.obsm where latent space coordinates are stored, by default ‘X_dm’.

  • flavor (str, optional) –

    Method for computing PLASTRO score, by default ‘gini’.

    • ’gini’: Computes Gini inequality index for each cell’s overlap distribution across radii. Measures how concentrated the overlap is at specific spatial scales.

      • High Gini (→1): Overlap concentrated at few radii (unequal distribution)

      • Low Gini (→0): Overlap evenly distributed across radii (equal distribution)

      • Uses ALL radii from 1 to maximum_radius

    • ’variable_radii’: Computes area under overlap curve using variance-filtered radii. Measures strength of lineage-phenotype concordance at most informative scales.

      • High score: Strong lineage-phenotype concordance (low plasticity)

      • Low score: Weak lineage-phenotype concordance (high plasticity)

      • Uses SELECTED radii based on variance threshold

  • parallel (bool, optional) – Whether to use parallel processing, by default False.

  • save_to (str, optional) – Directory path to save results, by default None.

  • show_plots (bool, optional) – Whether to display variance analysis plots (only for flavor=’variable_radii’), by default True.

Returns:

PLASTRO plasticity scores for each cell. Column name depends on flavor:

  • flavor=’gini’: Column ‘Gini_Index’ with values [0,1]

  • flavor=’variable_radii’: Column ‘PLASTRO_score’ with positive values

Return type:

pd.DataFrame

Examples

>>> import plastro
>>>
>>> # Compute Gini-based plasticity scores
>>> gini_scores = plastro.PLASTRO_score(char_matrix, ad, flavor='gini')
>>> print(f"Mean Gini index: {gini_scores['Gini_Index'].mean():.3f}")
>>>
>>> # Compute variance-based plasticity scores
>>> var_scores = plastro.PLASTRO_score(char_matrix, ad, flavor='variable_radii', threshold=0.95)
>>> print(f"Mean PLASTRO score: {var_scores['PLASTRO_score'].mean():.3f}")

Notes

The PLASTRO analysis workflow:

  1. Overlap Computation: For each cell and radius r, compute overlap between: - Lineage neighbors: r cells most similar by CRISPR mutations - Phenotype neighbors: r cells most similar in latent space

  2. Score Computation: Two complementary approaches:

    Gini Approach (flavor=’gini’): - Computes Gini inequality coefficient for each cell’s overlap profile - Measures how overlap varies with spatial scale for individual cells - Interpretation: Distribution pattern of lineage-phenotype concordance

    Variable Radii Approach (flavor=’variable_radii’): - Identifies radii with high variance in overlap across cells - Computes area under overlap curve for informative radii only - Interpretation: Overall strength of lineage-phenotype concordance

When to use each flavor:

  • Use ‘gini’ to understand overlap distribution patterns per cell

  • Use ‘variable_radii’ to measure overall plasticity strength

  • Both provide complementary views of the same underlying relationships

plastro.overlap.PLASTRO_overlaps(character_matrix: DataFrame, ad: AnnData, maximum_radius: int = 500, interval: int = 1, latent_space_key: str = 'X_dm', parallel: bool = False, save_to: str | None = None) DataFrame[source]

Compute overlaps between lineage and phenotypic neighborhoods.

For each cell and radius, computes the overlap between cells that are lineage neighbors (similar character states) and phenotypic neighbors (close in latent space).

Parameters:
  • character_matrix (pd.DataFrame) – Character matrix with mutation data.

  • ad (anndata.AnnData) – Annotated data object with phenotypic information.

  • maximum_radius (int, optional) – Maximum neighborhood radius, by default 500.

  • interval (int, optional) – Radius increment, by default 1.

  • latent_space_key (str, optional) – Key for latent space coordinates, by default ‘X_dm’.

  • parallel (bool, optional) – Use parallel processing, by default False.

  • save_to (str, optional) – Save directory, by default None.

Returns:

Overlap values for each cell (rows) at each radius (columns).

Return type:

pd.DataFrame

plastro.overlap.compute_variable_radii_plasticity_score(overlaps: DataFrame, threshold: float = 0.95, plot_variance: bool = True, save_to: str | None = None) Tuple[DataFrame, Series][source]

Compute PLASTRO scores using variable radii approach.

This method identifies optimal radius ranges based on variance in overlap across cells, then computes area under the overlap curve for selected radii. It focuses on the most informative spatial scales for plasticity analysis.

Parameters:
  • overlaps (pd.DataFrame) – Overlap matrix with cells as rows and radii as columns. Each value represents overlap between lineage and phenotype neighborhoods.

  • threshold (float, optional) – Variance threshold for radius selection (0-1), by default 0.95. Radii with variance >= threshold * max_variance are considered informative.

  • plot_variance (bool, optional) – Whether to plot variance analysis for radius selection, by default True.

  • save_to (str, optional) – Directory to save variance analysis plots, by default None.

Returns:

  • PLASTRO scores for each cell (column: ‘PLASTRO_score’)

  • Variance values across all radii

Return type:

Tuple[pd.DataFrame, pd.Series]

Notes

Algorithm: 1. Compute variance in overlap across cells for each radius 2. Identify radii with variance >= threshold * max_variance 3. Compute mean overlap across selected radii for each cell

Interpretation: - High scores: Strong concordance between lineage and phenotype (low plasticity) - Low scores: Weak concordance between lineage and phenotype (high plasticity) - Selected radii: Spatial scales that best differentiate cells by plasticity

Comparison to Gini approach: - Variable radii: Measures overall strength using informative scales - Gini: Measures distribution pattern across all scales

plastro.overlap.gini_index(arr: ndarray) float[source]

Compute the Gini inequality index for a 1D array.

The Gini index measures inequality in the distribution of values, commonly used in economics but applicable to any distribution analysis. For PLASTRO, it measures how concentrated overlap values are across radii.

Parameters:

arr (np.ndarray) – Array of overlap values between 0 and 1. Each value represents overlap at a different radius.

Returns:

Gini index value between 0 and 1. - 0: Perfect equality (all values identical) - 1: Perfect inequality (one value has everything, others have nothing)

Return type:

float

Notes

Mathematical definition: Gini = 1 - Σ(p_i + p_{i-1}) * (x_i - x_{i-1})

Where p_i is cumulative proportion and x_i are sorted values.

For PLASTRO interpretation: - High Gini: Overlap concentrated at specific radii (scale-specific plasticity) - Low Gini: Overlap distributed across radii (scale-invariant plasticity)

plastro.overlap.compute_gini_plasticity_score(overlaps: DataFrame) DataFrame[source]

Compute PLASTRO scores using Gini inequality index approach.

This method computes a Gini coefficient for each cell’s overlap distribution across all radii. The Gini index measures how concentrated or dispersed the lineage-phenotype concordance is across different spatial scales.

Parameters:

overlaps (pd.DataFrame) – Overlap matrix with cells as rows and radii as columns. This is the output from the PLASTRO_overlaps() function. Each value represents overlap between lineage and phenotype neighborhoods at a given radius.

Returns:

Gini plasticity scores for each cell with column ‘Gini_Index’. Values range from 0 to 1.

Return type:

pd.DataFrame

Notes

Algorithm: 1. For each cell, compute Gini coefficient of overlap values across ALL radii 2. Gini measures inequality in the overlap distribution

Interpretation: - High Gini (≈1): Overlap concentrated at few specific radii

  • Suggests scale-specific plasticity patterns

  • Lineage-phenotype concordance varies dramatically with spatial scale

  • Low Gini (≈0): Overlap evenly distributed across radii * Suggests scale-invariant plasticity patterns * Consistent lineage-phenotype relationship across scales

Comparison to Variable Radii approach: - Gini: Characterizes the pattern of how concordance varies with scale - Variable radii: Measures the strength of concordance at optimal scales

Use cases: - Understanding how plasticity manifests across spatial scales - Identifying cells with scale-specific vs scale-invariant plasticity - Complementary analysis to variable radii approach

plastro.overlap.compute_lineage_distances(character_matrix: DataFrame) DataFrame[source]

Compute pairwise lineage distances from character matrix.

Uses modified Hamming distance that accounts for different mutation states.

Parameters:

character_matrix (pd.DataFrame) – Character matrix with mutation states.

Returns:

Pairwise lineage distance matrix.

Return type:

pd.DataFrame

plastro.overlap.compute_phenotype_distances(ad: AnnData, latent_space_key: str) DataFrame[source]

Compute pairwise phenotypic distances from latent space.

Parameters:
  • ad (anndata.AnnData) – Annotated data object.

  • latent_space_key (str) – Key for latent space coordinates in ad.obsm.

Returns:

Pairwise phenotypic distance matrix.

Return type:

pd.DataFrame

plastro.overlap.compute_radius_overlaps(lineage_distances: DataFrame, phenotype_distances: DataFrame, radius: int) Series[source]

Compute overlaps for a specific radius.

Parameters:
  • lineage_distances (pd.DataFrame) – Pairwise lineage distances.

  • phenotype_distances (pd.DataFrame) – Pairwise phenotypic distances.

  • radius (int) – Neighborhood radius.

Returns:

Overlap values for each cell at the specified radius.

Return type:

pd.Series

plastro.overlap.get_k_nearest_neighbors(distance_matrix: DataFrame, cell: str, k: int) set[source]

Get k-nearest neighbors for a cell.

Parameters:
  • distance_matrix (pd.DataFrame) – Pairwise distance matrix.

  • cell (str) – Query cell name.

  • k (int) – Number of neighbors to return.

Returns:

Set of k-nearest neighbor cell names.

Return type:

set