Phenotype Simulation Module
The phenotype simulation module provides functions for generating synthetic single-cell datasets with realistic branching differentiation trajectories and phenotypic transitions.
Core Functions
|
Generate a complete AnnData object with simulated single-cell data. |
|
Generate a realistic single-cell dataset for plasticity testing. |
|
Create a binary tree with random distribution of leaves. |
|
Extract only terminal branches from simulated differentiation data. |
Utility Functions
|
Sample cells along a differentiation branch with realistic noise structure. |
|
Convert RGBA color to hexadecimal format. |
Function Details
Phenotype simulation module for generating synthetic single-cell data.
This module provides functions for creating synthetic single-cell datasets with branching differentiation trajectories, simulating realistic cellular development patterns and phenotypic transitions for testing plasticity algorithms.
- plastro.phenotype_simulation.create_random_binary_tree(n_leaves: int, sample_res: int) Tuple[source]
Create a binary tree with random distribution of leaves.
Generates a random binary tree structure for simulating cellular differentiation hierarchies. Each node represents a cellular state with a certain number of cells, and the tree structure represents the developmental relationships.
- Parameters:
- Returns:
A tuple representing the binary tree structure: - First element: number of samples at this node (int) - Second element: list of child branches (empty for leaf nodes)
- Return type:
Tuple
Examples
>>> tree = create_random_binary_tree(n_leaves=4, sample_res=50) >>> # Creates a tree with 4 terminal branches, each with ~50-500 cells
Notes
The tree structure is represented as nested tuples where: - Leaf nodes: (sample_count, []) - Internal nodes: (sample_count, [left_child, right_child])
This creates a realistic branching structure similar to cellular development where progenitor cells give rise to more specialized cell types.
- plastro.phenotype_simulation.sample_branch(base: ndarray, velocity: ndarray, sample_structure: Tuple, curvature: float = 0.2, var_decay: float = 1.5, dens_decay: float = 0.9, n_dim: int = 15, branch_name: str = 'b') Tuple[List[ndarray], List, List[int], List[ndarray]][source]
Sample cells along a differentiation branch with realistic noise structure.
Generates synthetic single-cell data along a branching trajectory that mimics cellular differentiation. Uses a physics-inspired model where cells follow curved paths through gene expression space with decreasing variance over time.
- Parameters:
base (np.ndarray) – Starting position in gene expression space (n_dim,).
velocity (np.ndarray) – Initial direction vector for trajectory (n_dim,).
sample_structure (Tuple) – Tree structure from create_random_binary_tree defining sampling.
curvature (float, optional) – Amount of random curvature in trajectory (0-1), by default 0.2. Higher values create more curved, realistic paths.
var_decay (float, optional) – Rate of variance decay along trajectory, by default 1.5. Higher values create more focused terminal populations.
dens_decay (float, optional) – Rate of density decay (cell loss), by default 0.9. Models cell death during differentiation.
n_dim (int, optional) – Number of dimensions (genes) in expression space, by default 15.
branch_name (str, optional) – Name identifier for this branch, by default ‘b’.
- Returns:
samples: List of cell expression matrices for each sub-branch
distributions: List of multivariate normal distributions used
n_draws: List of cell counts for each sub-branch
names: List of branch name arrays for each sub-branch
- Return type:
Tuple[List[np.ndarray], List, List[int], List[np.ndarray]]
Examples
>>> base = np.zeros(10) >>> velocity = np.ones(10) >>> structure = (100, []) # Simple leaf with 100 cells >>> samples, dists, counts, names = sample_branch(base, velocity, structure)
Notes
The sampling model creates realistic gene expression patterns by: - Adding curved random walk behavior via curvature parameter - Implementing variance decay to model cellular commitment - Using QR decomposition to create proper covariance structure - Applying density decay to model cell loss during development
- plastro.phenotype_simulation.generate_ad(sample_structure: Tuple, n_dim: int, show_plots: bool = False) AnnData[source]
Generate a complete AnnData object with simulated single-cell data.
Creates a comprehensive single-cell dataset with realistic gene expression patterns, UMAP embedding, clustering annotations, and proper metadata for studying cellular plasticity and differentiation.
- Parameters:
- Returns:
Complete annotated dataset containing: - X: Gene expression matrix (n_cells × n_genes) - obs: Cell metadata with ground truth, branch labels, colors - obsm: Dimensionality reductions (UMAP, diffusion components) - uns: Cluster colors and other metadata
- Return type:
AnnData
Examples
>>> structure = create_random_binary_tree(n_leaves=6, sample_res=100) >>> adata = generate_ad(structure, n_dim=20) >>> print(f"Generated {adata.n_obs} cells with {adata.n_vars} genes") >>> >>> # Visualize the simulated data >>> import scanpy as sc >>> sc.pl.umap(adata, color='branch') >>> >>> # Generate with plots enabled >>> adata_with_plots = generate_ad(structure, n_dim=20, show_plots=True)
Notes
The generated dataset includes: - Realistic branching trajectories in gene expression space - Ground truth probability densities for each cell - UMAP coordinates for visualization - Leiden clustering annotations - Color maps for consistent plotting - Diffusion components for plasticity analysis
This provides a complete testing framework for plasticity algorithms with known ground truth cellular relationships.
- plastro.phenotype_simulation.subset_to_terminal_branches(ad: AnnData, show_plots: bool = False) AnnData[source]
Extract only terminal branches from simulated differentiation data.
Identifies and extracts cells from terminal (leaf) branches of the differentiation tree. These represent fully differentiated cell types and are commonly used for lineage tracing analysis.
- Parameters:
ad (AnnData) – Annotated data object with branch annotations in obs[‘branch’].
show_plots (bool, optional) – Whether to display the terminal branches visualization, by default False.
- Returns:
Subset containing only cells from terminal branches.
- Return type:
AnnData
Examples
>>> # Generate full differentiation tree >>> structure = create_random_binary_tree(n_leaves=4, sample_res=100) >>> full_data = generate_ad(structure, n_dim=15) >>> >>> # Extract only terminal cell types >>> terminal_data = subset_to_terminal_branches(full_data) >>> print(f"Reduced from {full_data.n_obs} to {terminal_data.n_obs} cells") >>> >>> # Show terminal branch visualization >>> terminal_data = subset_to_terminal_branches(full_data, show_plots=True)
Notes
Terminal branches are identified as branch names that are not prefixes of any other branch names. For example, in branches [‘b’, ‘b-0’, ‘b-0-1’], only ‘b-0-1’ would be considered terminal.
This function also creates a visualization showing the terminal branches highlighted in red on the UMAP embedding.
- plastro.phenotype_simulation.rgba_to_hex(rgba: Tuple) str[source]
Convert RGBA color to hexadecimal format.
Utility function for converting matplotlib color tuples to hex strings for consistent color handling across different plotting libraries.
- Parameters:
rgba (Tuple) – RGBA color tuple with values either as floats (0-1) or integers (0-255).
- Returns:
Hexadecimal color string in format ‘#RRGGBBAA’.
- Return type:
Examples
>>> rgba_to_hex((1.0, 0.5, 0.0, 1.0)) # Orange, full opacity '#ff8000ff' >>> rgba_to_hex((255, 128, 0, 255)) # Same color, integer format '#ff8000ff'
- plastro.phenotype_simulation.simulate_realistic_dataset(n_cell_types: int = 6, cells_per_type: int = 100, n_genes: int = 20, noise_level: float = 0.2, seed: int | None = None, show_plots: bool = False) AnnData[source]
Generate a realistic single-cell dataset for plasticity testing.
Convenience function that combines tree generation and sampling to create a complete synthetic dataset with realistic parameters for testing plasticity simulation algorithms.
- Parameters:
n_cell_types (int, optional) – Number of terminal cell types to generate, by default 6.
cells_per_type (int, optional) – Approximate number of cells per cell type, by default 100.
n_genes (int, optional) – Number of genes (dimensions) in expression space, by default 20.
noise_level (float, optional) – Amount of noise in trajectories (0-1), by default 0.2.
seed (int, optional) – Random seed for reproducibility, by default None.
show_plots (bool, optional) – Whether to display plots during dataset generation, by default False.
- Returns:
Complete annotated dataset ready for plasticity analysis.
- Return type:
AnnData
Examples
>>> # Generate a standard test dataset >>> adata = simulate_realistic_dataset( ... n_cell_types=8, ... cells_per_type=150, ... n_genes=25, ... seed=42, ... show_plots=True # Display terminal branch plots ... ) >>> >>> # Visualize the dataset >>> import scanpy as sc >>> sc.pl.umap(adata, color=['branch', 'leiden'])
Notes
This function provides sensible defaults for most plasticity simulation experiments and ensures reproducible results when a seed is provided.