bamboost.core.collection
Collection management module for bamboost.
This module provides the Collection class and related utilities for managing collections of simulations in the bamboost framework. It includes functionality for creating, filtering, querying, and manipulating simulation collections, as well as integration with the underlying index and MPI communication.
Attributes
- __all__=
['Collection'] - log=
BAMBOOST_LOGGER.getChild('Collection')
Classes
_CollectionPicker
(self, key) -> CollectionArguments:
- key:
str
_FilterKeys
(self, collection)Arguments:
- collection:
Collection
Attributes:
- collection=
bamboost.core.collection._FilterKeys(collection)
(self, key) -> _KeyArguments:
- key:
str
CollectionMetadataStore
(self, uid, created_at=None, tags=list(), aliases=list(), author=None, description=None) -> NoneArguments:
- uid:
str - created_at:
datetime.datetime | None=None - tags:
list[str]=list() - aliases:
list[str]=list() - author:
str | dict | None=None - description:
str | None=None
Attributes:
Bases
(self, key) -> AnyArguments:
- key:
str
(cls, data, *, _collection) -> SelfArguments:
- cls
- data:
dict[str, typing.Any] - _collection:
Collection
(self) -> dict[str, Any](self) -> None(self, data) -> NoneUpdate the instance with values from a dictionary, storing unknown fields in the extras dictionary.
Arguments:
- data:
dict[str, typing.Any]The input dictionary.
Collection
(self, path=None, *, uid=None, create_if_not_exist=True, comm=None, index_instance=None, sync_collection=True, filter=None, sorter=None)Represents a collection of simulations in the bamboost framework.
The Collection class provides an interface for managing, querying, and manipulating a group of simulations stored in a directory, with support for filtering, indexing, and MPI communication.
Arguments:
- uid:
typing.Optional[str]=None - create_if_not_exist:
bool=True - sync_collection:
bool=True
Attributes:
Unique identifier of the collection.
- path:
pathlib.Path=Path(path or self._index.resolve_path(uid.upper())).absolute()Path to the collection directory.
- fromUID=
_CollectionPicker()Helper for selecting collections by UID.
- _comm=
Communicator() - _index=
bamboost.core.collection.Collection(index_instance) or bamboost.index.Index.bamboost.index.Index.default - _sorter=
bamboost.core.collection.Collection(sorter) - k=
_FilterKeys(self) - _record:
CollectionRecordReturns the in-memory representation of the collection.
If a filter is applied to the collection, returns a FilteredCollection object that represents the filtered view. Otherwise, returns the base CollectionRecord object for the collection.
- metadata:
CollectionMetadataStoreReturns the metadata of the collection.
The metadata can include information such as the collection's UID, creation date, tags, and aliases.
- df:
pandas.pandas.DataFrameReturns a pandas DataFrame representing the collection and its parameter space.
The DataFrame contains all simulations in the collection, including their parameters and metadata. The table is sorted according to the user-specified key and order in the configuration, if available.
Examples:
>>> db = Collection("path/to/collection")
>>> db.df # DataFrame of the collection
>>> sim = db["simulation_name"] # Access a simulation by name
>>> filtered = db.filter(db.k["param"] == 42)Bases
(self) -> Generator[Simulation, None, None]Iterate over all simulations in the collection.
(self, name_or_index) -> SimulationRetrieve a Simulation from the collection by name or index.
Arguments:
- name_or_index:
str | intThe name of the simulation (str) or its index (int) in the collection dataframe.
Returns
SimulationThe corresponding Simulation object.Examples:
>>> sim = collection["simulation_name"]
>>> sim = collection[0](self) -> int(self, *operators) -> CollectionReturns a new Collection filtered by the given operators.
This method applies the specified filter operators to the collection and returns a new Collection instance representing the filtered view. The original collection remains unchanged.
Arguments:
Returns
CollectionA new Collection instance containing only the simulations thatExamples:
>>> filtered = collection.filter(collection.k["param"] == 42)(self, key, ascending=True) -> CollectionReturns a new Collection sorted by the given instructions.
This method applies the specified sort instructions to the collection and returns a new Collection instance representing the sorted view. The original collection remains unchanged.
Arguments:
- key:
_Key | strA SortInstruction object or a string representing the parameter or metadata key to sort by.
- ascending:
bool=TrueIf True (default), sorts in ascending order. If False, sorts in descending order.
Returns
CollectionA new Collection instance with simulations sorted according toExamples:
>>> sorted_collection = collection.sort(SortInstruction("param", ascending=False))(self) -> list[str]Returns a list of all simulation names in the collection.
Returns
list[str]list[str]: A list containing the names of all simulations in the collection.(self, *, force_all=False) -> NoneSynchronize the database for this collection.
This method updates the collection's cache by syncing the underlying index and
filesystem. It ensures that the collection's metadata and simulation information
are up to date. If force_all is True, a full rescan and update of all
simulations in the collection will be performed, regardless of their current cache
state.
Arguments:
- force_all:
bool=FalseIf True, force a full resync of all simulations in the collection. If False (default), only update simulations that are out of sync.
(self, name=None, parameters=None, *, duplicate_action='raise', description=None, files=None, links=None, override=False) -> SimulationWriterCreate and initialize a new simulation in the collection, returning a SimulationWriter object.
This method is designed for parallel use, such as in batch scripts or parameter sweeps, where multiple simulations may be created concurrently. It handles creation of the simulation directory, duplicate checking, copying files, and setting up metadata and parameters.
Arguments:
- name:
typing.Optional[str]=NoneThe name/UID for the simulation. If not specified, a unique random ID will be generated.
- parameters:
typing.Optional[typing.Mapping[str, typing.Any]]=NoneDictionary of simulation parameters. If provided, these parameters will be checked against existing simulations for duplication. If not provided, parameters can be set later via
Simulation.parameters.Note
- Parameters are stored in the HDF5 file as attributes.
- If a value is a dict, it is flattened using
flatten_dict. - If a value is a list or array, it is stored as a dataset.
- duplicate_action:
typing.Literal['ignore', 'replace', 'skip', 'raise']='raise'Action to take if a simulation with the same parameters already exists. Options are: "ignore" (create anyway), "replace" (delete existing and create new), "skip" (return existing simulation), "raise" (default, raise DuplicateSimulationError).
- description:
typing.Optional[str]=NoneOptional description for the simulation.
- files:
typing.Optional[typing.Iterable[str]]=NoneOptional iterable of file paths to copy into the simulation directory. Each file will be copied with its original name.
- links:
typing.Optional[typing.Dict[str, str]]=NoneOptional dictionary of symbolic links to create in the simulation directory, mapping link names to target paths.
- override:
bool=FalseIf True, overwrite any existing simulation with the same name. If False (default), raises FileExistsError if a simulation with the same name exists.
Returns
SimulationWriterAn object for writing data and metadata to the newExamples:
>>> db.add(parameters={"a": 1, "b": 2})>>> db.add(name="my_sim", parameters={"a": 1, "b": 2})Note
- This method is safe for use in parallel (MPI) environments.
- Be cautious when using
duplicate_action="replace"as it will delete existing simulations with matching parameters, without asking again.
(self, *args, **kwargs) -> SimulationWriterDeprecated alias of add. See Collection.add method.
Arguments:
- args=
() - kwargs=
{}
(self, name) -> NoneCAUTION. Deletes one or more simulations from the collection.
This method removes the specified simulation(s) from both the filesystem and the index/database. It is a destructive operation and should be used with caution.
Arguments:
- name:
str | typing.Iterable[str]Name of the simulation to delete, or an iterable of names.
Examples:
>>> db.delete("simulation_name")
>>> db.delete(["sim1", "sim2", "sim3"])(self, parameter_selection) -> pd.DataFrameFind simulations matching the given parameter selection.
The parameter_selection dictionary can specify exact values for parameters, or use callables (such as lambda functions) for more complex filtering, such as inequalities or custom logic.
Arguments:
- parameter_selection:
typing.Mapping[str, typing.Any]Dictionary mapping parameter names to values or callables. If a value is a callable, it will be used as a filter function applied to the corresponding parameter column.
Returns
pandas.pandas.DataFramepd.DataFrame: DataFrame containing simulations that match the specifiedExamples:
>>> db.find({"a": 1, "b": lambda x: x > 2})
>>> db.find({"a": 1, "b": 2})(self) -> strHTML repr for ipython/notebooks, using jinja2 for templating.
(self, parameters, *, df=None) -> list[str]List the names (IDs) of simulations in the collection that have duplicate parameter values.
Arguments:
- parameters:
typing.MappingParameter dictionary to check for duplicates. Keys are parameter names, values are the values to match against existing simulations.
- df:
pandas.pandas.DataFrame | None=NoneDataFrame to search in. If not provided, the DataFrame from the SQL database is used.
Returns
list[str]list[str]: List of simulation names (IDs) that have the same parameter values(self, parameters) -> Literal[True]Check whether the given parameters dictionary already exists in the collection.
Returns True if no duplicates are found. Raises DuplicateSimulationError if
duplicates are found.
Arguments:
- parameters:
dictParameter dictionary to check for duplicates.
Bamboost