Collections

A collection is the central unit here. It offers an object that allows you to create, access and query its entries, from now on denoted as a Simulation.

Collections are implicitally created if they don’t exist yet. The first argument to Collection is a path. Let’s create our first collection at ../data/getting-started. This will create the directory and assign a unique ID to the new collection.

# Creating a new Collection
from bamboost import Collection

coll = Collection("../data/getting-started")
display(coll)

BAMBOOST / 315628DE80

Database	/home/runner/work/bamboost-docs/bamboost-docs/content/docs/../data/getting-started
UID	315628DE80
Size	1

Once created, it is often easier and safer to use the uid to reference the collection.

Although you can create and access collections with their path, it is good practice to explicitly create them using the command line interface and then use their ID in your code.

bamboost-cli new ./data-foo
bamboost-cli list

coll = Collection(uid="315628DE80")
coll.df

	name	created_at	description	status	submitted	E	disk.center	disk.radius	nu
0	ba02cde0a7	2025-08-25 14:10:03.686174		initialized	False	20	[0.5, 0.5]	2	0.3

If you are working in an ipython session (e.g. a jupyter notebook), use Collection.fromUID which will give you autocompletion for all of your existing (cached) collections.

coll = Collection.fromUID["315628DE80 - ost-docs/content/docs/data-foo"]

Simulations

Now that you have a collection, you can create simulations inside it. A simulation not only stores the output of your experiment, but simultaneously acts as its input file (assuming you run numerical experiments).

This means, the same entity is used in multiple steps of your workflow; experimental design, execution, postprocessing/analysis.

Experimental design

Creation

You have the intention of running a certain experiment with a specific set of input parameters, or input files, or anything. So we create the simulation with all the instructions it needs. Bundling all of this in a single place ensures reproducability. This most likely includes:

A dictionary of parameters
A script that produces the result for this simulation
A set of instructions on how to run the script

To create a new simulation, use add:

sim = coll.add(
    name="my-simulation",
    parameters={
        "param1": 73,
        "bar": [2, 3, 4, 5],
    },
)

Relevant files

Then, copy relevant files (or entire directories) into the simulation directory.

add includes a files argument to directly copy a list of files or directories.

sim.copy_files(["path/to/script.py", "img1.png", "path/to/some/directory"])

Run script

As a next step we can create a run script for the simulation. This is an auto-generated bash script with the purpose of providing a single access point to produce the results for this simulation.

create_run_script takes up to 3 arguments:

commands: an iterable of bash commands to run in sequence.
euler: a boolean flag. If set to true, then a slurm submission script is written instead of a pure bash script.
sbatch_kwargs: a dictionary of slurm job arguments.

sim.create_run_script(
    commands=["source .venv/bin/activate", "python3 script.py"], euler=False
)

Which will create the following file…

run.sh

#!/bin/bash

export SIMULATION_DIR=/absolute/path/to/collection/data/getting-started/my-simulation
export SIMULATION_ID=315628DE80:my-simulation

source .venv/bin/activate
python3 script.py

Notice that the script exports two variables; SIMULATION_DIR and SIMULATION_ID. These should be used in your executable script script.py to infer the simulation on which to execute the script.

Execution

After creation, a simulation is started by running it’s run script. Of course, you could manually run it, however, you can also start it from within python or using the CLI (not implemented yet).

If you use the bamboost methods to submit your job, and you have previously created a SLURM submission script instead of a pure bash script, your job is automatically submitted on the cluster.

You can also submit the simulation like this directly after you create it.

Submit simulation from python

sim.submit_simulation()

Submit simulation using CLI

bamboost-cli Collection-ID Simulation-Name

Postprocessing

After your job has finished, it’s time to mine the diamonds. Besides the workflow management described in this document, bamboost also provides a data model using a HDF5 file (see the docs on how to write data using bamboost). Much of the postprocessing capabilities depend on using the bamboost hdf5 module, but some is also general.

bamboost is optimized for interactive use in jupyter notebooks. It offers autocompletion for most of its objects, such as collections and simulation, but also objects and data that is part of your simulations.

Collection

Initialize the collection as seen above, preferably using it’s ID. coll.df returns a pandas dataframe of the collection.

coll = Collection.fromUID["315628DE80"]
coll.df

	name	created_at	description	status	submitted	E	disk.center	disk.radius	nu	bar	param1
0	ba02cde0a7	2025-08-25 14:10:03.686174		initialized	False	20.0	[0.5, 0.5]	2.0	0.3	NaN	NaN
1	my-simulation	2025-08-25 14:10:07.087190		initialized	False	NaN	NaN	NaN	NaN	[2, 3, 4, 5]	73.0

For now, you can use the dataframe to query simulations based on their parameters.

The custom filtering interface is currently work in progress

Simulation

To get a Simulation object, use brackets (it will autocomplete in notebooks).

sim = coll["my-simulation"]
sim

my-simulation

initialized

Not submitted

time stamp: 2025-08-25 14:10:07.087190

Files

⭗ my-simulation
├── run.sh
└── data.h5

0 directories, 2 files

Parameters

Parameter	Value
bar	[2 3 4 5]
param1	73

A few things the simulation object offers are:

sim.files: A file picker that returns the absolute path of files stored in this simulation
sim.parameters: The parameters
sim.metadata: some metadata
sim.root: Access to the root group inside data.h5 (more on the h5py wrapper of bamboost will be written elsewhere).
sim.data: The main (time)-series of the simulation
much more

To be continued…

Getting Started