Implementation of the paper "A conceptual framework for revealing rare bacterial species in the gut microbiome through guided data transformation: Beyond enterotypes" (accepted to Methods in Ecology and Evolution). The package provides a comprehensive set of functions and classes to perform synthetic data generation, clustering, residual estimation, and simulation experiments on datasets. The datas for IBD are available in this repository.
- Paper: A conceptual framework for revealing rare bacterial species in the gut microbiome through guided data transformation
- If you build on this work, please cite the paper above.
git clone https://github.com/PierreHouedry/BiomeSampler.git
cd BiomeSampler
pip install -r requirements.txt`from BiomeSampler import Simulator, estimate_n_clusters, estimate_residualsLoad your original data from a CSV file where rows represent samples and columns represent features.
original_data_path = 'path/to/your/data.csv'
output_path = 'path/to/save/results'
simulator = Simulator(original_data_path, output_path)Generate synthetic datasets using the simulation method.
n_samples = 100 # Number of synthetic samples to generate
n_experiments = 10 # Number of simulation experiments to run
display = True # Display progress and metrics
experiments, metrics_list, metrics_mean = simulator.simulation(n_samples, n_experiments, display)experiments: List of DataFrames containing synthetic samples for each experiment.
metrics_list: List of dictionaries with metrics for each experiment.
metrics_mean: Dictionary with average metrics across all experiments.
Determine the optimal number of clusters using silhouette scores.
data_path = 'path/to/your/data.csv'
n_min = 2 # Minimum number of clusters
n_max = 10 # Maximum number of clusters
silhouette_scores = estimate_n_clusters(data_path, n_min, n_max)Estimate residuals based on Bray-Curtis distances and optional CLR transformation.
data_path = 'path/to/your/data.csv'
n_clusters = 3
result_path = 'path/to/save/results'
clr_transform = False # Set to True to apply CLR transformation
residuals = estimate_residuals(data_path, n_clusters, result_path, clr_transform)This project is licensed under the MIT License.