elektronn3.data.sources module

Code related to data sources (HDF5 etc.)

class elektronn3.data.sources.DataSource[source]

Bases: object

class elektronn3.data.sources.HDF5DataSource(fname, key, in_memory=False)[source]

Bases: DataSource

An h5py.Dataset wrapper for safe multiprocessing. Opens the file and the dataset on each read/property access and then immediately closes it.

This is a workaround for this issue and related data corruptions: https://github.com/pytorch/pytorch/issues/11929.

By avoiding open file handles before worker processes are forked, concurrency issues with HDF5’s global state do not apply.

elektronn3.data.sources.slice_3d(src, coords_lo, coords_hi, dtype=<class 'numpy.float32'>, prepend_empty_axis=False, check_bounds=True)[source]

Slice a patch of 3D image data out of a data source.

Parameters:
  • src (DataSource) – Source data set from which to read data. The expected data shapes are (C, D, H, W) or (D, H, W).

  • coords_lo (Sequence[int]) – Lower bound of the coordinates where data should be read from in src.

  • coords_hi (Sequence[int]) – Upper bound of the coordinates where data should be read from in src.

  • dtype (type) – NumPy dtype that the sliced array will be cast to if it doesn’t already have this dtype.

  • prepend_empty_axis (bool) – Prepends a new empty (1-sized) axis to the sliced array before returning it.

  • check_bounds – If True (default), only indices that are within the bounds of src will be allowed (no negative indices or slices to indices that exceed the shape of src, which would normally just be ignored).

Return type:

ndarray

Returns:

Sliced image array.