bamboost.core.collection
Collection management module for bamboost.
This module provides the Collection class and related utilities for managing collections of simulations in the bamboost framework. It includes functionality for creating, filtering, querying, and manipulating simulation collections, as well as integration with the underlying index and MPI communication.
Attributes
- __all__=
['Collection', 'NotACollectionError']
- log=
BAMBOOST_LOGGER.getChild('Collection')
Classes
NotACollectionError
(self, path)
Raised when a path is not a valid collection.
Arguments:
- path:
pathlib.Path
_CollectionPicker
(self, key) -> Collection
Arguments:
- key:
str
_FilterKeys
(self, collection)
Arguments:
- collection:
Collection
Attributes:
- collection=
bamboost.core.collection._FilterKeys(collection)
(self, key) -> _Key
Arguments:
- key:
str
Collection
(self, path=None, *, uid=None, create_if_not_exist=True, comm=None, index_instance=None, sync_collection=True, filter=None)
Represents a collection of simulations in the bamboost framework.
The Collection class provides an interface for managing, querying, and manipulating a group of simulations stored in a directory, with support for filtering, indexing, and MPI communication.
Arguments:
- uid:
typing.Optional[str]
=None
- create_if_not_exist:
bool
=True
- sync_collection:
bool
=True
Attributes:
- FROZEN:
bool
=False
If True, the collection does not look for new simulations after initialization.
Unique identifier for the collection.
- path:
pathlib.Path
=Path(path or self._index.resolve_path(uid.upper())).absolute()
Filesystem path to the collection directory.
- fromUID=
_CollectionPicker()
- _comm=
Communicator()
- _index=
bamboost.core.collection.Collection(index_instance) or bamboost.index.Index.bamboost.index.Index.default
- _filter=
bamboost.core.collection.Collection(filter)
Helper for key completion and filtering.
- _orm:
CollectionORM | bamboost.index.sqlmodel.FilteredCollection
Returns the ORM (Object Relational Mapping) object for the collection.
If a filter is applied to the collection, returns a FilteredCollection object that represents the filtered view. Otherwise, returns the base CollectionORM object for the collection.
- df:
pandas.pandas.DataFrame
DataFrame view of the collection and its parameter space.
Examples:
>>> db = Collection("path/to/collection")
>>> db.df # DataFrame of the collection
>>> sim = db["simulation_name"] # Access a simulation by name
>>> filtered = db.filter(db.k["param"] == 42)
Bases
(self, name_or_index) -> Simulation
Retrieve a Simulation from the collection by name or index.
Arguments:
- name_or_index:
str | int
The name of the simulation (str) or its index (int) in the collection dataframe.
Returns
Simulation
The corresponding Simulation object.Examples:
>>> sim = collection["simulation_name"]
>>> sim = collection[0]
(self) -> int
(self, *operators) -> Collection
Returns a new Collection filtered by the given operators.
This method applies the specified filter operators to the collection and returns a new Collection instance representing the filtered view. The original collection remains unchanged.
Arguments:
Returns
Collection
A new Collection instance containing only the simulations thatExamples:
>>> filtered = collection.filter(collection.k["param"] == 42)
(self) -> list[str]
Returns a list of all simulation names in the collection.
Returns
list[str]
list[str]: A list containing the names of all simulations in the collection.(self, *, force_all=False) -> None
Synchronize the database for this collection.
This method updates the collection's cache by syncing the underlying
index and filesystem. It ensures that the collection's metadata and simulation
information are up to date. If force_all
is True, a full rescan and update
of all simulations in the collection will be performed, regardless of their
current cache state.
Arguments:
- force_all:
bool
=False
If True, force a full resync of all simulations in the collection. If False (default), only update simulations that are out of sync.
(self, name=None, parameters=None, *, description=None, files=None, links=None, override=False) -> SimulationWriter
Create and initialize a new simulation in the collection, returning a SimulationWriter object.
This method is designed for parallel use, such as in batch scripts or parameter sweeps, where multiple simulations may be created concurrently. It handles creation of the simulation directory, duplicate checking, copying files, and setting up metadata and parameters.
Arguments:
- name:
typing.Optional[str]
=None
The name/UID for the simulation. If not specified, a unique random ID will be generated.
- parameters:
typing.Optional[typing.Dict[str, typing.Any]]
=None
Dictionary of simulation parameters. If provided, these parameters will be checked against existing simulations for duplication. If not provided, parameters can be set later via
Simulation.parameters
.Note
- Parameters are stored in the HDF5 file as attributes.
- If a value is a dict, it is flattened using
flatten_dict
. - If a value is a list or array, it is stored as a dataset.
- description:
typing.Optional[str]
=None
Optional description for the simulation.
- files:
typing.Optional[typing.Iterable[str]]
=None
Optional iterable of file paths to copy into the simulation directory. Each file will be copied with its original name.
- links:
typing.Optional[typing.Dict[str, str]]
=None
Optional dictionary of symbolic links to create in the simulation directory, mapping link names to target paths.
- override:
bool
=False
If True, overwrite any existing simulation with the same name. If False (default), raises FileExistsError if a simulation with the same name exists.
Returns
SimulationWriter
An object for writing data and metadata to the new simulation.Examples:
>>> db.create_simulation(parameters={"a": 1, "b": 2})
>>> db.create_simulation(name="my_sim", parameters={"a": 1, "b": 2})
Note
- The files and links specified are copied or created in the simulation directory.
- This method is safe for use in parallel (MPI) environments.
(self, parameter_selection) -> pd.DataFrame
Find simulations matching the given parameter selection.
The parameter_selection dictionary can specify exact values for parameters, or use callables (such as lambda functions) for more complex filtering, such as inequalities or custom logic.
Arguments:
- parameter_selection:
dict[str, typing.Any]
Dictionary mapping parameter names to values or callables. If a value is a callable, it will be used as a filter function applied to the corresponding parameter column.
Returns
pandas.pandas.DataFrame
pd.DataFrame: DataFrame containing simulations that match the specified criteria.Examples:
>>> db.find({"a": 1, "b": lambda x: x > 2})
>>> db.find({"a": 1, "b": 2})
(self) -> str
HTML repr for ipython/notebooks, using jinja2 for templating.
(self, name) -> None
CAUTIOUS. Deletes a simulation.
Arguments:
- name:
str
Name of the simulation to delete.
(self, parameters, *, df=None) -> list[str]
List the names (IDs) of simulations in the collection that have duplicate parameter values.
Arguments:
- parameters:
dict
Parameter dictionary to check for duplicates. Keys are parameter names, values are the values to match against existing simulations.
- df:
pandas.pandas.DataFrame
=None
DataFrame to search in. If not provided, the DataFrame from the SQL database is used.
Returns
list[str]
list[str]: List of simulation names (IDs) that have the same parameter values as provided.(self, parameters, uid, duplicate_action='prompt') -> tuple
Check whether the given parameters dictionary already exists in the collection.
This method checks for duplicate simulations with the same parameter values. If duplicates are found, it prompts the user (or uses the specified action) to decide whether to replace, create a new simulation, or abort.
Arguments:
- parameters:
dict
Parameter dictionary to check for duplicates.
- uid:
str
The UID for the simulation to be created.
- duplicate_action:
str
='prompt'
Returns
tuple
(bool, str)
- bool: Whether to continue with the operation.
- str: The UID to use for the simulation.