Indexing of bamboost collections and their simulations/parameters. SQLAlchemy is used to interact with the SQLite database.

The index is generated on the fly or can be explicitly created by scanning the search_paths for collections. The index is stored as a SQLite database that stores the path of collections (characterized with a unique UID), as well as the metadata and parameters of all simulations.

The Index class provides the public API for interacting with the index. This works in paralell execution, but the class is designed to execute any operations on the database on the root process only. Methods that return something use bcast to cast the result to all processes. Any SQL operation is executed only on the root process!

Database schema:

collections: Contains information about the collections, namely uids and corresponding paths.
simulations: Contains information about the simulations, including names, statuses, and links to the corresponding parameters.
parameters: Contains the parameters associated with the simulations.

Attributes

log=BAMBOOST_LOGGER.getChild('Database')
IDENTIFIER_PREFIX='.bamboost-collection'
IDENTIFIER_SEPARATOR='-'

Functions

simulation_metadata_from_h5(file) -> Tuple[SimulationMetadataT, SimulationParameterT]

Extract metadata and parameters from a BAMBOOST simulation HDF5 file.

Reads the metadata and parameters from the HDF5 file and returns them as a tuple.

Arguments:

file:pathlib.Path
Path to the HDF5 file.

create_identifier_file(path, uid) -> None

Create an identifier file in the collection directory.

Arguments:

path:StrPath
Path to the collection directory
uid:str
UID of the collection

get_identifier_filename(uid) -> str

Arguments:

uid:str

_sql_transaction(func) -> Callable[Concatenate[Index, _P], _T]

Decorator to add a session to the function signature.

Arguments:

func:typing.Callable[typing_extensions.Concatenate[Index, bamboost._typing._P], bamboost._typing._T]
The function to decorate.

_validate_path(path, uid) -> bool

Arguments:

path:pathlib.Path
uid:str

_find_uid_from_path(path) -> Optional[str]

Arguments:

path:pathlib.Path

_find_collection(uid, root_dir) -> tuple[Path, ...]

Find the collection with UID under given root_dir.

Arguments:

uid:str
UID to search for
root_dir:pathlib.Path
root directory for search

_find_files(pattern, root_dir, exclude=None) -> Tuple[Path, ...]

Locate every file matching pattern under root_dir while pruning directory names listed in exclude (exact-match on the final path part).

Returns an immutable tuple of absolute paths (str) just like the POSIX helper.

Arguments:

pattern:str
root_dir:str | os.os.PathLike
exclude:typing.Iterable[str] | None=None

_scan_directory_for_collections(root_dir) -> tuple[tuple[str, Path], ...]

Scan the directory for collections.

Arguments:

root_dir:pathlib.Path
Directory to scan for collections

Returns

tuple[tuple[str, pathlib.Path], ...]Tuple of tuples with the UID and path of the collection

Classes

CollectionUID

UID of a collection. If no UID is provided, a new one is generated.

CollectionUID.__new__(cls, uid=None, length=10)

Arguments:

cls
uid:typing.Optional[str]=None
length:int=10

CollectionUID.generate_uid(length) -> str

Arguments:

length:int

LazyDefaultIndex

LazyDefaultIndex(self) -> None

Attributes:

_instance=None

LazyDefaultIndex.__delete__(self, instance) -> None

Arguments:

instance:None

LazyDefaultIndex.__set__(self, instance, value) -> None

Arguments:

instance:None
value:Index

LazyDefaultIndex.__get__(self, instance, owner) -> Index

Arguments:

instance:None
owner:typing.Type[Index]

Index

Index(self, sql_file=None, comm=None, *, search_paths=None) -> None

API for indexing BAMBOOST collections and simulations.

Arguments:

sql_file:typing.Optional[StrPath]=None
comm:typing.Optional[Comm]=None
search_paths:typing.Optional[typing.Iterable[str | pathlib.Path]]=None

Attributes:

_comm=Communicator()
_engine:sqlalchemy.Engine
_sm:typing.Callable[..., sqlalchemy.orm.Session]
_s:sqlalchemy.orm.Session
search_paths:PathSet=PathSet(search_paths or config.index.searchPaths)
Paths to scan for collections.
default:LazyDefaultIndex=LazyDefaultIndex()
A default index instance. Uses the default SQLite database file and search paths from the configuration.
_file=bamboost.index.base.Index(sql_file) or bamboost.config.bamboost.config.index.bamboost.config.index.databaseFile
The path to the SQLite database file.
_isolated=bamboost.config.bamboost.config.index.bamboost.config.index.isolated
Whether project based indexing is used.
_url=f'sqlite:///{bamboost.index.base.Index(self).bamboost.index.base.Index(self)._file}'
The URL to the SQLite database file.
all_collections:typing.Sequence[CollectionORM]
Return all collections in the index. Eagerly loads the simulations and its parameters.
all_simulations:typing.Sequence[SimulationORM]
Return all simulations in the index. Eagerly loads the parameters.
all_parameters:typing.Sequence[ParameterORM]
Return all parameters in the index.

Usage

Create an instance of the Index class and use its methods to interact with the index. $ from bamboost.index import Index $ index = Index()

Scan for collections in known paths: $ index.scan_for_collections()

Resolve the path of a collection: $ index.resolve_path()

Get a simulation from its collection and simulation name: $ index.get_simulation(, )

Index.sql_transaction(self) -> Generator[Session, None, None]

Context manager for a SQL transaction.

If no transaction is active, a new transaction is started. If a transaction is active, the current session is used.

Usage

>>> with index.sql_transaction() as s:
...     s.execute(...)

Index.scan_for_collections(self, *, search_paths=None) -> list[tuple[str, Path]]

Scan known paths for collections and update the index.

Iterates through the search paths and searches files with the identifier file structure. If a collection is found, it is added to the cache.

Arguments:

search_paths:List[pathlib.Path]=None
Paths to scan for collections. Defaults to config.index.searchPaths.

Index.check_integrity(self) -> None

Check the integrity of the cache.

This method checks if the paths stored in the cache are valid. If a path is not valid, it is removed from the cache.

Index.resolve_path(self, uid, *, search_paths=None) -> Path

Resolve and return the path of a collection from its UID. Raises a FileNotFoundError if the collection is not found in the search paths.

Arguments:

uid:str
UID of the collection
search_paths:typing.Optional[typing.Set[StrPath]]=None
Paths to search for the collection

Index.resolve_uid(self, path) -> CollectionUID

Resolve the UID of a collection from a path.

Returns the UID of the collection or a new UID if it can't be determined.

Arguments:

path:StrPath
Path of the collection

Index.sync_collection(self, uid, path=None, *, force_all=False) -> None

Sync the table with the file system.

Iterates through the simulations in the collection and updates the metadata and parameters if the HDF5 file has been modified.

Arguments:

uid:str
UID of the collection
path:typing.Optional=None
Path of the collection
force_all:bool=False

Index.collection(self, uid) -> CollectionORM | None

Return a collection from the index.

Arguments:

uid:str
UID of the collection

Index.simulation(self, collection_uid, name) -> SimulationORM | None

Return a simulation from the index.

Arguments:

collection_uid:str
UID of the collection
name:str
Name of the simulation

Index.upsert_collection(self, uid, path) -> None

Cache a collection in the index.

Arguments:

uid:str
UID of the collection
path:pathlib.Path
Path of the collection

Index.upsert_simulation(self, collection_uid, simulation_name, parameters=None, metadata=None, *, collection_path=None) -> None

Cache a simulation from a collection.

Arguments:

collection_uid:str
UID of the collection
simulation_name:str
Name of the simulation
parameters:typing.Optional[typing.Mapping[typing.Any, typing.Any]]=None
metadata:typing.Optional[typing.Mapping[typing.Any, typing.Any]]=None
collection_path:typing.Optional=None
Path of the collection

Index.update_simulation_metadata(self, collection_uid, simulation_name, data) -> None

Update the metadata of a simulation by passing it as a dict.

Arguments:

collection_uid:str
simulation_name:str
data:typing.Mapping
Dictionary with new data

Index.update_simulation_parameters(self, collection_uid, simulation_name, parameters) -> None

Update the parameters of a simulation by passing it as a dict.

Arguments:

collection_uid:str
simulation_name:str
parameters:SimulationParameterT
Dictionary with new parameters

Index._initialize_root_process(self, url) -> None

Arguments:

url:str

Index._drop_collection(self, uid) -> None

Drop a collection from the cache.

Arguments:

uid:str
UID of the collection

Index._drop_simulation(self, collection_uid, simulation_name) -> None

Drop a simulation from the cache.

Arguments:

collection_uid:str
UID of the collection
simulation_name:str
Name of the simulation

Index._get_collection_path(self, uid) -> Optional[Path]

Arguments:

uid:str

Index._get_collections(self) -> Sequence[CollectionORM]

Index._get_simulation(self, collection_uid, simulation_name) -> SimulationORM | None

Arguments:

collection_uid:CollectionUID | str
simulation_name:str

bamboost.index.base

Attributes

Functions

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Returns

Classes

CollectionUID

Arguments:

Arguments:

LazyDefaultIndex

Attributes:

Arguments:

Arguments:

Arguments:

Index

Arguments:

Attributes:

Usage

Usage

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

Arguments:

On this page