Getting Started

Installation

You can install the Bio-Volumentations library from PyPI using:

pip install bio-volumentations

The required packages are:

See the project’s PyPI page for more details.

Importing

You can import Bio-Volumentations into your project using:

import bio_volumentations as biovol

How to Use Bio-Volumentations?

The Bio-Volumentations library processes 3D, 4D, and 5D images. Each image must be represented as numpy.ndarray and must conform to the following conventions:

The order of dimensions is [C, Z, Y, X, T], where C is channel dimension, Z, Y, and X are spatial dimensions, and T is time dimension.
The three spatial (Z, Y, X) dimensions must always be present. To transform a 2D image, please create a dummy Z dimension first. (Or consider using a more suitable library, such as Albumentations.)
The channel (C) dimension is optional for the input image. However, the output image will always be at least 4-dimensional. If the C dimension is not present in the input, the library will automatically create a dummy dimension in its place, so the output image shape will be [1, Z, Y, X].
The time (T) dimension is optional and can only be present if the channel (C) dimension is also present in the input data. To process single-channel time-lapse images, please create a dummy C dimension.

Thus, an input image is interpreted in the following ways based on its dimensionality:

3D: a single-channel volumetric image [Z, Y, X];
4D: a multi-channel volumetric image [C, Z, Y, X];
5D: a single- or multi-channel volumetric image sequence [C, Z, Y, X, T].

The shape of the output image will be either [C, Z, Y, X] (for cases 1 & 2) or [C, Z, Y, X, T] (for case 3).

The images are type-casted to a floating-point datatype before being transformed, irrespective of their actual datatype.

For the specification of image annotation conventions, please see the respective section below.

All transformations are implemented as callable classes inheriting from an abstract Transform class. Upon instantiating a transformation object, one has to specify the parameters of the transformation.

The transformations work in a fully 3D fashion. Individual channels and time points of a data volume are usually transformed separately and in the same manner; however, certain transformations can also work along these dimensions. For instance, GaussianBlur can perform the blurring along the temporal dimension and with different strength in individual channels.

The data can be transformed by a call to the transformation object. However, it is strongly recommended to use Compose to create and use transformation pipelines. An instantiated Compose object encapsulates the full transformation pipeline and provides additional support: it automatically checks and adjusts image format and datatype, outputs the image as a contiguous array, and can optionally convert the transformed image to a desired format. If you call transformations outside of Compose, we cannot guarantee the all assumptions are checked and enforced, so you might encounter unexpected behaviour.

Below, there are several examples of how to use the Bio-Volumentations library. You are also welcome to check the API reference to learn more about the individual transforms.

Example: Transforming a Single Image

To create a transformation pipeline, you just need to instantiate all desired transformations (with the desired parameter values) and then feed a list of these transformations into a new Compose object.

Optionally, you can specify a datatype conversion transformation that will be applied after the last transformation in the list, for example from the default numpy.ndarray to a PyTorch torch.Tensor. You can also specify the probability of applying the whole pipeline as a number between 0 and 1. The default probability is 1 (i.e., the pipeline is applied in each call). See the Compose docs for more details.

Note: You can also toggle the probability of applying the individual transforms. To do so, you can use the parameters p and always_apply when instantiating the transformation objects. If always_apply==True, the transformation is applied every time the pipeline is called, irrespective of the value of p; if always_apply==False, the transformation is applied with probability p, which must be a number between 0 and 1.

The Compose object is callable. The data is passed as keyword arguments, and the call returns a dictionary with the same keywords and corresponding transformed data. This might look like an overkill for a single image, but it will come handy when transforming images with additional targets. The default keyword for the image data is, unsurprisingly, 'image'.

import numpy as np
from bio_volumentations import Compose, RandomGamma, RandomRotate90, GaussianBlur

# Create the transformation using Compose from a list of transformations
aug = Compose([
        RandomGamma(gamma_limit = (0.8, 1.2), p = 0.8),
        RandomRotate90(axes = [1, 2, 3], p = 1),
        GaussianBlur(sigma = 1.2, p = 0.8)
      ])

# Generate an image - shape [C, Z, Y, X]
img = np.random.rand(1, 128, 256, 256)

# Transform the image
# Please note that the image must be passed as a keyword argument to the transformation pipeline
# and extracted from the outputted dictionary.
data = {'image': img}
aug_data = aug(**data)
transformed_img = aug_data['image']

Example: Transforming Images with Annotations

Sometimes, it is necessary to transform an image with some associated additional targets. To that end, Bio-Volumentations define several target types:

image for the image data (numpy.ndarray with floating-point datatype);
mask for integer-valued label images (numpy.ndarray with integer datatype);
float_mask for real-valued label images (numpy.ndarray with floating-point datatype);
keypoints for a list of keypoints;
bboxes for a list of bounding boxes; and
value for any non-transformed data.

The mask and float_mask targets are expected to have the same shape as the image target except for the channel (C) dimension which must not be included. For example, a mask and/or float_mask of shape [150, 300, 300] can correspond to images of shape [150, 300, 300], [1, 150, 300, 300], as well as [4, 150, 300, 300]. If you want to use a multi-channel mask or float_mask, you have to split it into a set of single-channel mask or float_mask targets, respectively, and input them as stand-alone targets (see the section below on transforming multiple targets of the same type).

The keypoints target is represented as a list of tuples. Each tuple represents the absolute coordinates of a keypoint in the volume, so it must contain either 3 or 4 numbers (for volumetric and time-lapse volumetric data, respectively).

The bboxes target is represented as a list of bounding boxes. Each bounding box is represented by a tuple of 3 or 4 values:

Triplet = Tuple[float, float, float]

Bbox = Tuple[Triplet, Triplet, float, Optional[str]]

where:

the two Triplets are representations of the bounding box in the specified format (see below),
the float value represents a time point,
the optional fourth value in the tuple, Optional[str], is a class label of the bounding box.

Bounding boxes are accepted in voc, coco, albumentations and yolo formats. The input format of your data can be specified in the Compose constructor with bbox_format parameter. For the normalized formats, image domain size is inferred from the image target and is correctly updated as transformations are being applied. The time point is a compulsory parameter when defining a bounding box even for static data; in this case, set it to an arbitrary value (we recommend 0).

The value target can hold any other data whose value does not change during the transformation process. This can be for example image-level information such as a classification label for the whole image.

The associated targets (which form a single data sample) are fed to the transformation pipeline as keyword arguments of a call to the Compose object. Consequently, they can be extracted from the outputted dictionary using the same keys. The default key values are 'image', 'mask', 'float_mask', 'keypoints', 'bboxes', and 'value'.

Prior to applying any user-defined transformation, the mask and float_mask targets are type-casted to integer and floating-point datatypes, respectively.

Importantly, there must always be an image-type target in the sample. It is also expected that all image, mask, and float_mask targets are of the same shape (except for the number of channels) and that all keypoints and bounding boxes are fully contained in the image domain.

You cannot define your own target types; that would require re-implementing all existing transforms.

import numpy as np
from bio_volumentations import Compose, RandomGamma, RandomRotate90, GaussianBlur

# Create the transformation using Compose from a list of transformations
aug = Compose([
        RandomGamma(gamma_limit = (0.8, 1.2), p = 0.8),
        RandomRotate90(axes = [1, 2, 3], p = 1),
        GaussianBlur(sigma = 1.2, p = 0.8)
      ])

# Generate image and a corresponding labeled image
img = np.random.rand(1, 128, 256, 256)
lbl = np.random.randint(0, 1, size=(128, 256, 256), dtype=np.uint8)

# Transform the images
# Please note that the images must be passed as keyword arguments to the transformation pipeline
# and extracted from the outputted dictionary.
data = {'image': img, 'mask': lbl}
aug_data = aug(**data)
transformed_img, transformed_lbl = aug_data['image'], aug_data['mask']

If a Random... transformation receives multiple targets on its input in a single call, the same transformation parameters are used to transform all of these targets. For example, RandomAffineTransform applies the same geometric transformation to all target types in a single call.

Some transformations, such as RandomGaussianNoise or RandomGamma, are only defined for the image target and leave the other target types unchanged. Please consult the documentation of the individual transforms for more details.

import numpy as np
from bio_volumentations import Compose, RandomGamma, RandomRotate90, GaussianBlur, RandomScale

# Define a helper function to convert a numpy ndarray to a tuple
def np_to_tuple(arr: np.ndarray):
    return tuple(arr.tolist())

# Create the transformation using Compose from a list of transformations
aug = Compose([
    RandomGamma(gamma_limit = (0.8, 1.2), p = 0.8),
    RandomRotate90(axes = [1, 2, 3], p = 1),
    GaussianBlur(sigma = 1.2, p = 0.8),
    RandomScale((0.8, 1.1))
])

# Generate image and a corresponding labeled image
img = np.random.rand(1, 128, 256, 256)
lbl = np.random.randint(0, 1, size=(128, 256, 256), dtype=np.uint8)

# Generate keypoints
keypts = [np_to_tuple(np.random.randint(0, 127, 3)) for _ in range(20)]

# Generate random bboxes
bboxes = [(np_to_tuple(np.random.randint(0, 127, 3)),
           np_to_tuple(np.random.randint(128, 256, 3)),
           0) for _ in range(20)]

# Transform the images
# Please note that the images and annotations must be passed as keyword arguments to the transformation pipeline
# and extracted from the outputted dictionary.
data = {'image': img, 'mask': lbl, 'keypoints': keypts, 'bboxes': bboxes}

aug_data = aug(**data)
transformed_img = aug_data['image']
transformed_lbl = aug_data['mask']
transformed_keypts = aug_data['keypoints']
transformed_bbox = aug_data['bboxes']

In this case, keypoints and bboxes are only transformed by RandomRotate90 and RandomScale, while image and mask are transformed by all four transformations.

Another example of transforming an annotated image is available at the project’s GitLab, where a runnable Python script and a test data sample are provided. See the readme at GitLab for more details.

Example: Transforming Multiple Targets of the Same Type

You can input arbitrary number of inputs to any transformation. To achieve this, you have to define keywords for the individual inputs when creating the Compose object. The specified keywords will then be used to input the images to the transformation call as well as to extract the transformed images from the outputted dictionary.

Specifically, you can define image-type target keywords using the img_keywords parameter - its value must be a tuple of strings, each string representing a single keyword. Similarly, there are mask_keywords, fmask_keywords, keypoints_keywords, bboxes_keywords, and value_keywords parameters for the respective target types. The keywords can be any valid dictionary keys, and they must be unique.

You do not need to use all specified keywords in a transformation call. However, at least the target with the 'image' keyword must be present in each transformation call. In our example below, there are seven target keywords defined: four keywords defined explicitly (two for image, one for mask, and one for float_mask) and three defined implicitly (for keypoints, bboxes, and value), but we only transform three targets.

import numpy as np
from bio_volumentations import Compose, RandomGamma, RandomRotate90, GaussianBlur

# Create the transformation using Compose from a list of transformations and define targets
aug = Compose([
        RandomGamma(gamma_limit = (0.8, 1.2), p = 0.8),
        RandomRotate90(axes = [1, 2, 3], p = 1),
        GaussianBlur(sigma = 1.2, p = 0.8)
    ],
    img_keywords=('image', 'abc'), mask_keywords=('mask',), fmask_keywords=('nothing',))

# Generate the image data
img = np.random.rand(1, 128, 256, 256)
img1 = np.random.rand(1, 128, 256, 256)
lbl = np.random.randint(0, 1, size=(128, 256, 256), dtype=np.uint8)

# Transform the images
# Please note that the images must be passed as keyword arguments to the transformation pipeline
# and extracted from the outputted dictionary.
data = {'image': img, 'abc': img1, 'mask': lbl}
aug_data = aug(**data)
transformed_img = aug_data['image']
transformed_img1 = aug_data['abc']
transformed_lbl = aug_data['mask']

Example: Adding a Custom Transformation

Each transformation inherits from the Transform class. You can thus easily implement your own transformations and use them with this library. You can check our implementations to see how this can be done; for example, Flip can be implemented as follows:

import numpy as np
from typing import List
from bio_volumentations import DualTransform
import my_package  # a package with backend functionality (such as keypoint/bbox flipping)

class Flip(DualTransform):
    # Initialize the transform
    def __init__(self, axes: List[int] = None, always_apply=False, p=1):
        super().__init__(always_apply, p)
        self.axes = axes

    # Transform the image
    def apply(self, img, **params):
        return np.flip(img, params["axes"])

    # Transform the int-valued mask
    def apply_to_mask(self, mask, **params):
        return np.flip(mask, axis=[item - 1 for item in params["axes"]])  # Mask has no channels

    # Transform the float-valued mask - no need to implement. By default, apply_to_float_mask() uses
    # the implementation of apply_to_mask(), unless it is overridden (see the implementation of DualTransform).

    # Transform the keypoints
    def apply_to_keypoints(self, keypoints, **params):
        return my_package.flip_keypoints(keypoints, axes=params['axes'], img_shape=params['img_shape'])

    # Transform the bounding boxes
    def apply_to_bboxes(self, keypoints, **params):
        return my_package.flip_bboxes(keypoints, axes=params['axes'], img_shape=params['img_shape'])

    # Set transformation parameters. This is useful especially for RandomXXX transforms
    # to ensure consistent transformation of samples with multiple targets.
    def get_params(self, **data):
        axes = [1, 2, 3] if self.axes is None else self.axes
        img_shape = np.array(data['image'].shape[1:4])
        return {"axes": axes, "img_shape": img_shape}

Example: Using Bio-Volumentations with automatic augmentation frameworks

Bio-Volumentations can also be used with existing automatic augmentation frameworks. We prepared examples of using it with AutoAugment and RandAugment - they are available in the respective file at GitLab.

Bounding box examples in different formats

This section showcases how a single bounding box is represented in different formats. Suppose we have a time-lapse image sequence of shape [1, 10, 10, 10, 20] (i.e., with a single channel and 20 time points) and a bounding box stretching from pixel [2, 3, 5] to pixel [6, 5, 8] in time point 14.

voc: absolute coordinates, minimal and maximal corners of the bounding box

bbox = (2, 3, 5), (6, 5, 8), 14

coco: absolute coordinates, minimal corner and depth, height, width of the bounding box

bbox = (2, 3, 5), (4, 2, 3), 14

albumentations: normalized coordinates, minimal and maximal corners of the bounding box

bbox = (0.2, 0.3, 0.5), (0.6, 0.5, 0.8), 14

yolo: normalized coordinates, central point and depth, height, width of the bounding box

bbox = (0.4 0.4 0.65), (0.4 0.2 0.3), 14

Optionally, we can specify a class label as the fourth parameter of the bounding box:

bbox = (2, 3, 5), (6, 5, 8), 14, 'class_label'

If you are wondering which bbox format we are using internally… it is voc. The list of tuples representing a list of bounding boxes you input to the transformation pipeline (the instantiated Compose object) is, in fact, first converted to our internal representation that is based on the voc format right before the user-defined transformations are applied. The result is then converted back to a list of tuples in the original bbox format right before returning the new list of bboxes from the Compose object to the user.