Bamboost

bamboost.core.collection

Collection management module for bamboost.

This module provides the Collection class and related utilities for managing collections of simulations in the bamboost framework. It includes functionality for creating, filtering, querying, and manipulating simulation collections, as well as integration with the underlying index and MPI communication.

Attributes

  • __all__=['Collection']
  • log=BAMBOOST_LOGGER.getChild('Collection')

Classes

_CollectionPicker

_CollectionPicker.__getitem__(self, key) -> Collection
Arguments:
  • key:str

_FilterKeys

_FilterKeys(self, collection)
Arguments:
Attributes:
  • collection=bamboost.core.collection._FilterKeys(collection)
_FilterKeys.__getitem__(self, key) -> _Key
Arguments:
  • key:str

CollectionMetadataStore

CollectionMetadataStore(self, uid, created_at=None, tags=list(), aliases=list(), author=None, description=None) -> None
Arguments:
  • uid:str
  • created_at:datetime.datetime | None=None
  • tags:list[str]=list()
  • aliases:list[str]=list()
  • author:str | dict | None=None
  • description:str | None=None
Attributes:
  • _collection:Collection | None=field(default=None, repr=False, compare=False, init=False)
  • _comm:Communicator=field(default_factory=Communicator, repr=False, compare=False, init=False)
Arguments:
  • key:str
CollectionMetadataStore.from_dict(cls, data, *, _collection) -> Self
Arguments:
  • cls
  • data:dict[str, typing.Any]
  • _collection:Collection
CollectionMetadataStore.to_dict(self) -> dict[str, Any]
CollectionMetadataStore.update(self, data) -> None

Update the instance with values from a dictionary, storing unknown fields in the extras dictionary.

Arguments:
  • data:dict[str, typing.Any]

    The input dictionary.

Collection

Collection(self, path=None, *, uid=None, create_if_not_exist=True, comm=None, index_instance=None, sync_collection=True, filter=None, sorter=None)

Represents a collection of simulations in the bamboost framework.

The Collection class provides an interface for managing, querying, and manipulating a group of simulations stored in a directory, with support for filtering, indexing, and MPI communication.

Arguments:
  • path:typing.Optional[StrPath]=None
  • uid:typing.Optional[str]=None
  • create_if_not_exist:bool=True
  • comm:typing.Optional[Comm]=None
  • index_instance:typing.Optional[Index]=None
  • sync_collection:bool=True
  • filter:typing.Optional[Filter]=None
  • sorter:typing.Optional[Sorter]=None
Attributes:
  • uid:CollectionUID=CollectionUID(uid or self._index.resolve_uid(self.path))

    Unique identifier of the collection.

  • path:pathlib.Path=Path(path or self._index.resolve_path(uid.upper())).absolute()

    Path to the collection directory.

  • fromUID=_CollectionPicker()

    Helper for selecting collections by UID.

  • _comm=Communicator()
  • _filter:typing.Optional[Filter]=bamboost.core.collection.Collection(filter)
  • _index=bamboost.core.collection.Collection(index_instance) or bamboost.index.Index.bamboost.index.Index.default
  • _sorter=bamboost.core.collection.Collection(sorter)
  • k=_FilterKeys(self)
  • Returns the in-memory representation of the collection.

    If a filter is applied to the collection, returns a FilteredCollection object that represents the filtered view. Otherwise, returns the base CollectionRecord object for the collection.

  • Returns the metadata of the collection.

    The metadata can include information such as the collection's UID, creation date, tags, and aliases.

  • df:pandas.pandas.DataFrame

    Returns a pandas DataFrame representing the collection and its parameter space.

    The DataFrame contains all simulations in the collection, including their parameters and metadata. The table is sorted according to the user-specified key and order in the configuration, if available.

Examples:
>>> db = Collection("path/to/collection")
>>> db.df  # DataFrame of the collection
>>> sim = db["simulation_name"]  # Access a simulation by name
>>> filtered = db.filter(db.k["param"] == 42)
Bases
ElligibleForPlugin1
Collection.__iter__(self) -> Generator[Simulation, None, None]

Iterate over all simulations in the collection.

Collection.__getitem__(self, name_or_index) -> Simulation

Retrieve a Simulation from the collection by name or index.

Arguments:
  • name_or_index:str | int

    The name of the simulation (str) or its index (int) in the collection dataframe.

Returns
SimulationThe corresponding Simulation object.
Examples:
>>> sim = collection["simulation_name"]
>>> sim = collection[0]
Collection.__len__(self) -> int
Collection.filter(self, *operators) -> Collection

Returns a new Collection filtered by the given operators.

This method applies the specified filter operators to the collection and returns a new Collection instance representing the filtered view. The original collection remains unchanged.

Arguments:
Returns
CollectionA new Collection instance containing only the simulations that
Examples:
>>> filtered = collection.filter(collection.k["param"] == 42)
Collection.sort(self, key, ascending=True) -> Collection

Returns a new Collection sorted by the given instructions.

This method applies the specified sort instructions to the collection and returns a new Collection instance representing the sorted view. The original collection remains unchanged.

Arguments:
  • key:_Key | str

    A SortInstruction object or a string representing the parameter or metadata key to sort by.

  • ascending:bool=True

    If True (default), sorts in ascending order. If False, sorts in descending order.

Returns
CollectionA new Collection instance with simulations sorted according to
Examples:
>>> sorted_collection = collection.sort(SortInstruction("param", ascending=False))

Returns a list of all simulation names in the collection.

Returns
list[str]list[str]: A list containing the names of all simulations in the collection.
Collection.sync_cache(self, *, force_all=False) -> None

Synchronize the database for this collection.

This method updates the collection's cache by syncing the underlying index and filesystem. It ensures that the collection's metadata and simulation information are up to date. If force_all is True, a full rescan and update of all simulations in the collection will be performed, regardless of their current cache state.

Arguments:
  • force_all:bool=False

    If True, force a full resync of all simulations in the collection. If False (default), only update simulations that are out of sync.

Collection.add(self, name=None, parameters=None, *, duplicate_action='raise', description=None, files=None, links=None, override=False) -> SimulationWriter

Create and initialize a new simulation in the collection, returning a SimulationWriter object.

This method is designed for parallel use, such as in batch scripts or parameter sweeps, where multiple simulations may be created concurrently. It handles creation of the simulation directory, duplicate checking, copying files, and setting up metadata and parameters.

Arguments:
  • name:typing.Optional[str]=None

    The name/UID for the simulation. If not specified, a unique random ID will be generated.

  • parameters:typing.Optional[typing.Mapping[str, typing.Any]]=None

    Dictionary of simulation parameters. If provided, these parameters will be checked against existing simulations for duplication. If not provided, parameters can be set later via Simulation.parameters.

    Note

    • Parameters are stored in the HDF5 file as attributes.
    • If a value is a dict, it is flattened using flatten_dict.
    • If a value is a list or array, it is stored as a dataset.
  • duplicate_action:typing.Literal['ignore', 'replace', 'skip', 'raise']='raise'

    Action to take if a simulation with the same parameters already exists. Options are: "ignore" (create anyway), "replace" (delete existing and create new), "skip" (return existing simulation), "raise" (default, raise DuplicateSimulationError).

  • description:typing.Optional[str]=None

    Optional description for the simulation.

  • files:typing.Optional[typing.Iterable[str]]=None

    Optional iterable of file paths to copy into the simulation directory. Each file will be copied with its original name.

  • links:typing.Optional[typing.Dict[str, str]]=None

    Optional dictionary of symbolic links to create in the simulation directory, mapping link names to target paths.

  • override:bool=False

    If True, overwrite any existing simulation with the same name. If False (default), raises FileExistsError if a simulation with the same name exists.

Returns
SimulationWriterAn object for writing data and metadata to the new
Examples:
>>> db.add(parameters={"a": 1, "b": 2})
>>> db.add(name="my_sim", parameters={"a": 1, "b": 2})

Note

  • This method is safe for use in parallel (MPI) environments.
  • Be cautious when using duplicate_action="replace" as it will delete existing simulations with matching parameters, without asking again.
Collection.create_simulation(self, *args, **kwargs) -> SimulationWriter

Deprecated alias of add. See Collection.add method.

Arguments:
  • args=()
  • kwargs={}
Collection.delete(self, name) -> None

CAUTION. Deletes one or more simulations from the collection.

This method removes the specified simulation(s) from both the filesystem and the index/database. It is a destructive operation and should be used with caution.

Arguments:
  • name:str | typing.Iterable[str]

    Name of the simulation to delete, or an iterable of names.

Examples:
>>> db.delete("simulation_name")
>>> db.delete(["sim1", "sim2", "sim3"])
Collection.find(self, parameter_selection) -> pd.DataFrame

Find simulations matching the given parameter selection.

The parameter_selection dictionary can specify exact values for parameters, or use callables (such as lambda functions) for more complex filtering, such as inequalities or custom logic.

Arguments:
  • parameter_selection:typing.Mapping[str, typing.Any]

    Dictionary mapping parameter names to values or callables. If a value is a callable, it will be used as a filter function applied to the corresponding parameter column.

Returns
pandas.pandas.DataFramepd.DataFrame: DataFrame containing simulations that match the specified
Examples:
>>> db.find({"a": 1, "b": lambda x: x > 2})
>>> db.find({"a": 1, "b": 2})

HTML repr for ipython/notebooks, using jinja2 for templating.

Collection._list_duplicates(self, parameters, *, df=None) -> list[str]

List the names (IDs) of simulations in the collection that have duplicate parameter values.

Arguments:
  • parameters:typing.Mapping

    Parameter dictionary to check for duplicates. Keys are parameter names, values are the values to match against existing simulations.

  • df:pandas.pandas.DataFrame | None=None

    DataFrame to search in. If not provided, the DataFrame from the SQL database is used.

Returns
list[str]list[str]: List of simulation names (IDs) that have the same parameter values
Collection._check_duplicate(self, parameters) -> Literal[True]

Check whether the given parameters dictionary already exists in the collection. Returns True if no duplicates are found. Raises DuplicateSimulationError if duplicates are found.

Arguments:
  • parameters:dict

    Parameter dictionary to check for duplicates.