Bamboost
bamboostcorehdf5ref

bamboost.core.hdf5.ref

This module provides a high-level abstraction for working with HDF5 (h5py) groups and datasets. It is built on the concept of describing an object in the file with a reference to it (deterministic reference: a file instance, and a path inside the HDF file).

The reference handles file opening and closing, and provides a simple interface to data, attributes and subgroups. In essence, this model provides h5py objects like any other in-memory data structure.

Attributes

  • log=BAMBOOST_LOGGER.getChild('hdf5')
  • _RT_group=TypeVar('_RT_group', bound=(Union['Group', 'Dataset']))
  • _g=bamboost.core.hdf5.ref.Group
  • _d=bamboost.core.hdf5.ref.Dataset

Classes

InvalidReferenceError

InvalidReferenceError(self, path, filename)
Arguments:

RefStatus

Attributes:
  • INVALID=0
  • VALID=1
  • NOT_CHECKED=2

H5Reference

H5Reference(self, path, file)
Arguments:
  • path:str
  • file:HDF5File[bamboost._typing._MT]
Attributes:
  • _status:RefStatus=bamboost.core.hdf5.ref.RefStatus.bamboost.core.hdf5.ref.RefStatus.NOT_CHECKED
  • _path=HDF5Path(path)
  • _obj:typing.Union[h5py.h5py.Group, h5py.h5py.Dataset, h5py.h5py.Datatype]

    Returns the h5py object bound at the path of this reference.

  • attrs:AttrsDict[bamboost._typing._MT]
  • parent:Group[bamboost._typing._MT]
Arguments:
  • key
H5Reference.new(cls, path, file, _type=None) -> _RT_group

Returns a new pointer object.

Arguments:
  • cls
  • path:str
  • file:HDF5File[bamboost._typing._MT]
  • _type:typing.Optional[typing.Type[_RT_group]]=None

Group

Group(self, path, file)
Arguments:
  • path:str
  • file:HDF5File[bamboost._typing._MT]
Attributes:
  • _group_map=FilteredFileMap(file.file_map, path)
  • _status=RefStatus(_valid)
  • _obj:h5py.h5py.Group
Group.__contains__(self, key) -> bool
Arguments:
  • key:str
Group.__delitem__(self, key) -> None

Deletes an item.

Arguments:
  • key
Group.__setitem__(self, key, newvalue)

Used to set an attribute. Will be written as an attribute to the group.

Arguments:
  • key
  • newvalue
Group.items(self, *, filter=None) -> Generator[Tuple[str, Union[Group[_MT], Dataset[_MT]]], None, None]
Arguments:
  • filter:typing.Optional[typing.Literal['groups', 'datasets']]=None
Group.require_self(self) -> None

Create the group if it doesn't exist yet.

Note

This is a parallel collective operation. It MUST be executed on all ranks to avoid deadlocks, unless the instance is wrapped under a comm_self context manager (from ), which allows safe execution from a single rank only.

Group.require_group(self, name, *, return_type=None)

Create a group if it doesn't exist yet.

Arguments:
  • name
  • return_type=None

Note

This is a parallel collective operation. It MUST be executed on all ranks to avoid deadlocks, unless the instance is wrapped under a comm_self context manager (from ), which allows safe execution from a single rank only.

Group.require_dataset(self, name, shape, dtype, exact=False, **kwargs) -> h5py.Dataset

Ensure a dataset exists under the given name with the specified shape/dtype.

Arguments:
  • name:str
  • shape:tuple[int, ...]
  • dtype
  • exact:bool=False
  • kwargs={}

Note

This is a parallel collective operation under Parallel HDF5. It must be executed collectively across all processes, unless wrapped under a comm_self context manager (from ), which allows safe execution from a single rank only.

Group.write_distributed_array(self, name, vector, indices=None, attrs=None, dtype=None, *, file_map=True) -> None

Add or overwrite a dataset.

Arguments:
  • name:str

    Name for the dataset

  • vector:ArrayLike

    Data array to write

  • indices:numpy.typing.NDArray[numpy.numpy.int_] | None=None

    Optional. 1D array of global row indices where local data should be written. If None, data is assumed to be contiguous and will be written as such.

  • attrs:typing.Optional[typing.Dict[str, typing.Any]]=None

    Optional. Attributes of dataset.

  • dtype:typing.Optional[str]=None

    Optional. dtype of dataset.

  • file_map:bool=True

Note

This is a parallel collective operation. It MUST be executed on all ranks to avoid deadlocks, unless the instance is wrapped under a comm_self context manager (from ), which allows safe execution from a single rank only.

Group.write_distributed_contiguous_array(self, name, vector, attrs=None, dtype=None, *, file_map=True) -> None

Add a dataset to the group. Error is thrown if attempting to overwrite with different shape than before. If same shape, data is overwritten (this is inherited from h5py -> require_dataset)

Arguments:
  • name:str

    Name for the dataset

  • vector:ArrayLike

    Data to write (max 2d)

  • attrs:typing.Optional[typing.Dict[str, typing.Any]]=None

    Optional. Attributes of dataset.

  • dtype:typing.Optional[str]=None

    Optional. dtype of dataset. If not specified, uses dtype of input array

  • file_map:bool=True

    Optional. If True, the dataset is added to the file map. Default is True.

Note

This is a parallel collective operation. Ranks coordinate via allgather and write non-overlapping slices concurrently. It MUST be executed on all ranks to avoid deadlocks, unless wrapped under a comm_self context manager (from ), which allows safe execution from a single rank only.

Group.add_numerical_dataset(self, *args, **kwargs)
Arguments:
  • args=()
  • kwargs={}
Group.write_distributed_scattered_array(self, name, indices, vector, attrs=None, dtype=None, *, file_map=True) -> None

Add or overwrite a dataset using non-contiguous global indices (DOF map).

Arguments:
  • name:str

    Name for the dataset

  • indices:ArrayLike

    1D array of global row indices where local data should be written

  • vector:ArrayLike

    Data array to write

  • attrs:typing.Optional[typing.Dict[str, typing.Any]]=None

    Optional. Attributes of dataset.

  • dtype:typing.Optional[str]=None

    Optional. dtype of dataset.

  • file_map:bool=True

    Optional. If True, the dataset is added to the file map.

Note

This is a parallel collective operation. It MUST be executed on all ranks to avoid deadlocks, unless the instance is wrapped under a comm_self context manager (from ), which allows safe execution from a single rank only.

Group.add_dataset(self, name, data, attrs=None, dtype=None) -> None

Add a dataset to the group. Error is thrown if attempting to overwrite.

Arguments:
  • name:str

    Name for the dataset

  • data:typing.Any

    Data to write

  • attrs:typing.Optional[typing.Dict[str, typing.Any]]=None

    Optional. Attributes of dataset.

  • dtype:typing.Optional[str]=None

    Optional. dtype of dataset. If not specified, uses dtype of input data

Note

This method is non-collective and does not require communication. The data is written only by the root process (rank 0), and other processes do not need to call this method.

Repr showing the content of the group.

On this page