plastro.simulate_realistic_dataset

plastro.simulate_realistic_dataset(n_cell_types: int = 6, cells_per_type: int = 100, n_genes: int = 20, noise_level: float = 0.2, seed: int | None = None, show_plots: bool = False) AnnData[source]

Generate a realistic single-cell dataset for plasticity testing.

Convenience function that combines tree generation and sampling to create a complete synthetic dataset with realistic parameters for testing plasticity simulation algorithms.

Parameters:
  • n_cell_types (int, optional) – Number of terminal cell types to generate, by default 6.

  • cells_per_type (int, optional) – Approximate number of cells per cell type, by default 100.

  • n_genes (int, optional) – Number of genes (dimensions) in expression space, by default 20.

  • noise_level (float, optional) – Amount of noise in trajectories (0-1), by default 0.2.

  • seed (int, optional) – Random seed for reproducibility, by default None.

  • show_plots (bool, optional) – Whether to display plots during dataset generation, by default False.

Returns:

Complete annotated dataset ready for plasticity analysis.

Return type:

AnnData

Examples

>>> # Generate a standard test dataset
>>> adata = simulate_realistic_dataset(
...     n_cell_types=8,
...     cells_per_type=150,
...     n_genes=25,
...     seed=42,
...     show_plots=True  # Display terminal branch plots
... )
>>>
>>> # Visualize the dataset
>>> import scanpy as sc
>>> sc.pl.umap(adata, color=['branch', 'leiden'])

Notes

This function provides sensible defaults for most plasticity simulation experiments and ensures reproducible results when a seed is provided.