Collection management module for bamboost.

This module provides the Collection class and related utilities for managing collections of simulations in the bamboost framework. It includes functionality for creating, filtering, querying, and manipulating simulation collections, as well as integration with the underlying index and MPI communication.

Attributes

__all__=['Collection', 'NotACollectionError']
log=BAMBOOST_LOGGER.getChild('Collection')

Classes

NotACollectionError

NotACollectionError(self, path)

Raised when a path is not a valid collection.

Arguments:

path:pathlib.Path

_CollectionPicker

_CollectionPicker.__getitem__(self, key) -> Collection

Arguments:

key:str

_CollectionPicker._ipython_key_completions_(self)

_FilterKeys

_FilterKeys(self, collection)

Arguments:

collection:Collection

Attributes:

collection=bamboost.core.collection._FilterKeys(collection)

_FilterKeys.__getitem__(self, key) -> _Key

Arguments:

key:str

_FilterKeys._ipython_key_completions_(self)

Collection

(self, path=None, *, uid=None, create_if_not_exist=True, comm=None, index_instance=None, sync_collection=True, filter=None)

Represents a collection of simulations in the bamboost framework.

The Collection class provides an interface for managing, querying, and manipulating a group of simulations stored in a directory, with support for filtering, indexing, and MPI communication.

Arguments:

path:typing.Optional[StrPath]=None
uid:typing.Optional[str]=None
create_if_not_exist:bool=True
comm:typing.Optional[Comm]=None
index_instance:typing.Optional[Index]=None
sync_collection:bool=True
filter:typing.Optional[Filter]=None

Attributes:

FROZEN:bool=False
If True, the collection does not look for new simulations after initialization.
uid:CollectionUID=CollectionUID(uid or self._index.resolve_uid(self.path))
Unique identifier for the collection.
path:pathlib.Path=Path(path or self._index.resolve_path(uid.upper())).absolute()
Filesystem path to the collection directory.
fromUID=_CollectionPicker()
_comm=Communicator()
_index=bamboost.core.collection.Collection(index_instance) or bamboost.index.Index.bamboost.index.Index.default
_filter=bamboost.core.collection.Collection(filter)
k:_FilterKeys=_FilterKeys(self)
Helper for key completion and filtering.
_orm:CollectionORM | bamboost.index.sqlmodel.FilteredCollection
Returns the ORM (Object Relational Mapping) object for the collection.

If a filter is applied to the collection, returns a FilteredCollection object that represents the filtered view. Otherwise, returns the base CollectionORM object for the collection.
df:pandas.pandas.DataFrame
DataFrame view of the collection and its parameter space.

Examples:

>>> db = Collection("path/to/collection")
>>> db.df  # DataFrame of the collection
>>> sim = db["simulation_name"]  # Access a simulation by name
>>> filtered = db.filter(db.k["param"] == 42)

Bases

ElligibleForPlugin1

ElligibleForPlugin.__new__()

Collection.__getitem__(self, name_or_index) -> Simulation

Retrieve a Simulation from the collection by name or index.

Arguments:

name_or_index:str | int
The name of the simulation (str) or its index (int) in the collection dataframe.

Returns

SimulationThe corresponding Simulation object.

Examples:

>>> sim = collection["simulation_name"]
>>> sim = collection[0]

Collection.__len__(self) -> int

Collection.filter(self, *operators) -> Collection

Returns a new Collection filtered by the given operators.

This method applies the specified filter operators to the collection and returns a new Collection instance representing the filtered view. The original collection remains unchanged.

Arguments:

operators:Operator=()

Returns

CollectionA new Collection instance containing only the simulations that

Examples:

>>> filtered = collection.filter(collection.k["param"] == 42)

Collection.all_simulation_names(self) -> list[str]

Returns a list of all simulation names in the collection.

Returns

list[str]list[str]: A list containing the names of all simulations in the collection.

Collection.sync_cache(self, *, force_all=False) -> None

Synchronize the database for this collection.

This method updates the collection's cache by syncing the underlying index and filesystem. It ensures that the collection's metadata and simulation information are up to date. If force_all is True, a full rescan and update of all simulations in the collection will be performed, regardless of their current cache state.

Arguments:

force_all:bool=False
If True, force a full resync of all simulations in the collection. If False (default), only update simulations that are out of sync.

Collection.create_simulation(self, name=None, parameters=None, *, description=None, files=None, links=None, override=False) -> SimulationWriter

Create and initialize a new simulation in the collection, returning a SimulationWriter object.

This method is designed for parallel use, such as in batch scripts or parameter sweeps, where multiple simulations may be created concurrently. It handles creation of the simulation directory, duplicate checking, copying files, and setting up metadata and parameters.

Arguments:

name:typing.Optional[str]=None
The name/UID for the simulation. If not specified, a unique random ID will be generated.
parameters:typing.Optional[typing.Dict[str, typing.Any]]=None
Dictionary of simulation parameters. If provided, these parameters will be checked against existing simulations for duplication. If not provided, parameters can be set later via Simulation.parameters.
Note

Parameters are stored in the HDF5 file as attributes.

If a value is a dict, it is flattened using flatten_dict.

If a value is a list or array, it is stored as a dataset.
description:typing.Optional[str]=None
Optional description for the simulation.
files:typing.Optional[typing.Iterable[str]]=None
Optional iterable of file paths to copy into the simulation directory. Each file will be copied with its original name.
links:typing.Optional[typing.Dict[str, str]]=None
Optional dictionary of symbolic links to create in the simulation directory, mapping link names to target paths.
override:bool=False
If True, overwrite any existing simulation with the same name. If False (default), raises FileExistsError if a simulation with the same name exists.

Returns

SimulationWriterAn object for writing data and metadata to the new simulation.

Examples:

>>> db.create_simulation(parameters={"a": 1, "b": 2})

>>> db.create_simulation(name="my_sim", parameters={"a": 1, "b": 2})

Note

The files and links specified are copied or created in the simulation directory.
This method is safe for use in parallel (MPI) environments.

Collection.find(self, parameter_selection) -> pd.DataFrame

Find simulations matching the given parameter selection.

The parameter_selection dictionary can specify exact values for parameters, or use callables (such as lambda functions) for more complex filtering, such as inequalities or custom logic.

Arguments:

parameter_selection:dict[str, typing.Any]
Dictionary mapping parameter names to values or callables. If a value is a callable, it will be used as a filter function applied to the corresponding parameter column.

Returns

pandas.pandas.DataFramepd.DataFrame: DataFrame containing simulations that match the specified criteria.

Examples:

>>> db.find({"a": 1, "b": lambda x: x > 2})
>>> db.find({"a": 1, "b": 2})

Collection._ipython_key_completions_(self)

Collection._repr_html_(self) -> str

HTML repr for ipython/notebooks, using jinja2 for templating.

Collection._delete_simulation(self, name) -> None

CAUTIOUS. Deletes a simulation.

Arguments:

name:str
Name of the simulation to delete.

Collection._list_duplicates(self, parameters, *, df=None) -> list[str]

List the names (IDs) of simulations in the collection that have duplicate parameter values.

Arguments:

parameters:dict
Parameter dictionary to check for duplicates. Keys are parameter names, values are the values to match against existing simulations.
df:pandas.pandas.DataFrame=None
DataFrame to search in. If not provided, the DataFrame from the SQL database is used.

Returns

list[str]list[str]: List of simulation names (IDs) that have the same parameter values as provided.

Collection._check_duplicate(self, parameters, uid, duplicate_action='prompt') -> tuple

Check whether the given parameters dictionary already exists in the collection.

This method checks for duplicate simulations with the same parameter values. If duplicates are found, it prompts the user (or uses the specified action) to decide whether to replace, create a new simulation, or abort.

Arguments:

parameters:dict
Parameter dictionary to check for duplicates.
uid:str
The UID for the simulation to be created.
duplicate_action:str='prompt'

Returns

tuple(bool, str) - bool: Whether to continue with the operation. - str: The UID to use for the simulation.

bamboost.core.collection

Attributes

Classes

NotACollectionError

Arguments:

_CollectionPicker

Arguments:

_FilterKeys

Arguments:

Attributes:

Arguments:

Collection

Arguments:

Attributes:

Examples:

Bases

Arguments:

Returns

Examples:

Arguments:

Returns

Examples:

Returns

Arguments:

Arguments:

Returns

Examples:

Arguments:

Returns

Examples:

Arguments:

Arguments:

Returns

Arguments:

Returns

On this page