Bamboost
bamboost/core/collection

bamboost.core.collection

Collection management module for bamboost.

This module provides the Collection class and related utilities for managing collections of simulations in the bamboost framework. It includes functionality for creating, filtering, querying, and manipulating simulation collections, as well as integration with the underlying index and MPI communication.

Attributes

  • __all__=['Collection', 'NotACollectionError']
  • log=BAMBOOST_LOGGER.getChild('Collection')

Classes

NotACollectionError

Raised when a path is not a valid collection.

Arguments:
  • path:pathlib.Path

_CollectionPicker

_CollectionPicker.__getitem__(self, key) -> Collection
Arguments:
  • key:str

_FilterKeys

_FilterKeys(self, collection)
Arguments:
Attributes:
  • collection=bamboost.core.collection._FilterKeys(collection)
_FilterKeys.__getitem__(self, key) -> _Key
Arguments:
  • key:str

Collection

Collection(self, path=None, *, uid=None, create_if_not_exist=True, comm=None, index_instance=None, sync_collection=True, filter=None)

Represents a collection of simulations in the bamboost framework.

The Collection class provides an interface for managing, querying, and manipulating a group of simulations stored in a directory, with support for filtering, indexing, and MPI communication.

Arguments:
  • path:typing.Optional[StrPath]=None
  • uid:typing.Optional[str]=None
  • create_if_not_exist:bool=True
  • comm:typing.Optional[Comm]=None
  • index_instance:typing.Optional[Index]=None
  • sync_collection:bool=True
  • filter:typing.Optional[Filter]=None
Attributes:
  • FROZEN:bool=False

    If True, the collection does not look for new simulations after initialization.

  • uid:CollectionUID=CollectionUID(uid or self._index.resolve_uid(self.path))

    Unique identifier for the collection.

  • path:pathlib.Path=Path(path or self._index.resolve_path(uid.upper())).absolute()

    Filesystem path to the collection directory.

  • fromUID=_CollectionPicker()
  • _comm=Communicator()
  • _index=bamboost.core.collection.Collection(index_instance) or bamboost.index.Index.bamboost.index.Index.default
  • _filter=bamboost.core.collection.Collection(filter)
  • k:_FilterKeys=_FilterKeys(self)

    Helper for key completion and filtering.

  • _orm:CollectionORM | bamboost.index.sqlmodel.FilteredCollection

    Returns the ORM (Object Relational Mapping) object for the collection.

    If a filter is applied to the collection, returns a FilteredCollection object that represents the filtered view. Otherwise, returns the base CollectionORM object for the collection.

  • df:pandas.pandas.DataFrame

    DataFrame view of the collection and its parameter space.

Examples:
>>> db = Collection("path/to/collection")
>>> db.df  # DataFrame of the collection
>>> sim = db["simulation_name"]  # Access a simulation by name
>>> filtered = db.filter(db.k["param"] == 42)
Bases
ElligibleForPlugin1
Collection.__getitem__(self, name_or_index) -> Simulation

Retrieve a Simulation from the collection by name or index.

Arguments:
  • name_or_index:str | int

    The name of the simulation (str) or its index (int) in the collection dataframe.

Returns
SimulationThe corresponding Simulation object.
Examples:
>>> sim = collection["simulation_name"]
>>> sim = collection[0]
Collection.__len__(self) -> int
Collection.filter(self, *operators) -> Collection

Returns a new Collection filtered by the given operators.

This method applies the specified filter operators to the collection and returns a new Collection instance representing the filtered view. The original collection remains unchanged.

Arguments:
Returns
CollectionA new Collection instance containing only the simulations that
Examples:
>>> filtered = collection.filter(collection.k["param"] == 42)

Returns a list of all simulation names in the collection.

Returns
list[str]list[str]: A list containing the names of all simulations in the collection.
Collection.sync_cache(self, *, force_all=False) -> None

Synchronize the database for this collection.

This method updates the collection's cache by syncing the underlying index and filesystem. It ensures that the collection's metadata and simulation information are up to date. If force_all is True, a full rescan and update of all simulations in the collection will be performed, regardless of their current cache state.

Arguments:
  • force_all:bool=False

    If True, force a full resync of all simulations in the collection. If False (default), only update simulations that are out of sync.

Collection.create_simulation(self, name=None, parameters=None, *, description=None, files=None, links=None, override=False) -> SimulationWriter

Create and initialize a new simulation in the collection, returning a SimulationWriter object.

This method is designed for parallel use, such as in batch scripts or parameter sweeps, where multiple simulations may be created concurrently. It handles creation of the simulation directory, duplicate checking, copying files, and setting up metadata and parameters.

Arguments:
  • name:typing.Optional[str]=None

    The name/UID for the simulation. If not specified, a unique random ID will be generated.

  • parameters:typing.Optional[typing.Dict[str, typing.Any]]=None

    Dictionary of simulation parameters. If provided, these parameters will be checked against existing simulations for duplication. If not provided, parameters can be set later via Simulation.parameters.

    Note

    • Parameters are stored in the HDF5 file as attributes.
    • If a value is a dict, it is flattened using flatten_dict.
    • If a value is a list or array, it is stored as a dataset.
  • description:typing.Optional[str]=None

    Optional description for the simulation.

  • files:typing.Optional[typing.Iterable[str]]=None

    Optional iterable of file paths to copy into the simulation directory. Each file will be copied with its original name.

  • links:typing.Optional[typing.Dict[str, str]]=None

    Optional dictionary of symbolic links to create in the simulation directory, mapping link names to target paths.

  • override:bool=False

    If True, overwrite any existing simulation with the same name. If False (default), raises FileExistsError if a simulation with the same name exists.

Returns
SimulationWriterAn object for writing data and metadata to the new simulation.
Examples:
>>> db.create_simulation(parameters={"a": 1, "b": 2})
>>> db.create_simulation(name="my_sim", parameters={"a": 1, "b": 2})

Note

  • The files and links specified are copied or created in the simulation directory.
  • This method is safe for use in parallel (MPI) environments.
Collection.find(self, parameter_selection) -> pd.DataFrame

Find simulations matching the given parameter selection.

The parameter_selection dictionary can specify exact values for parameters, or use callables (such as lambda functions) for more complex filtering, such as inequalities or custom logic.

Arguments:
  • parameter_selection:dict[str, typing.Any]

    Dictionary mapping parameter names to values or callables. If a value is a callable, it will be used as a filter function applied to the corresponding parameter column.

Returns
pandas.pandas.DataFramepd.DataFrame: DataFrame containing simulations that match the specified criteria.
Examples:
>>> db.find({"a": 1, "b": lambda x: x > 2})
>>> db.find({"a": 1, "b": 2})

HTML repr for ipython/notebooks, using jinja2 for templating.

Collection._delete_simulation(self, name) -> None

CAUTIOUS. Deletes a simulation.

Arguments:
  • name:str

    Name of the simulation to delete.

Collection._list_duplicates(self, parameters, *, df=None) -> list[str]

List the names (IDs) of simulations in the collection that have duplicate parameter values.

Arguments:
  • parameters:dict

    Parameter dictionary to check for duplicates. Keys are parameter names, values are the values to match against existing simulations.

  • df:pandas.pandas.DataFrame=None

    DataFrame to search in. If not provided, the DataFrame from the SQL database is used.

Returns
list[str]list[str]: List of simulation names (IDs) that have the same parameter values as provided.
Collection._check_duplicate(self, parameters, uid, duplicate_action='prompt') -> tuple

Check whether the given parameters dictionary already exists in the collection.

This method checks for duplicate simulations with the same parameter values. If duplicates are found, it prompts the user (or uses the specified action) to decide whether to replace, create a new simulation, or abort.

Arguments:
  • parameters:dict

    Parameter dictionary to check for duplicates.

  • uid:str

    The UID for the simulation to be created.

  • duplicate_action:str='prompt'
Returns
tuple(bool, str) - bool: Whether to continue with the operation. - str: The UID to use for the simulation.