How to Build an Image Augmentation Pipeline with Albumentations and PyTorch

Transforming images allows you to artificially increase the size of your dataset to the point where you can use relatively small datasets to train a computer vision model.
By Boris Delovski • Updated on Feb 23, 2023
blog image

In my previous articles in this series, I covered how to apply different types of transformations to images using the Albumentations library. Transforming images using various pixel-level and spatial-level transformations allows you to artificially increase the size of your dataset, to the point where you can use relatively small datasets to train a computer vision model. In this article, I'm going to demonstrate how to create an image augmentation pipeline with Albumentations and PyTorch

 

Classification is a relatively simple problem. Your goal is to train a model so that, when you feed an image to it, it will predict the label that is associated with that image. There are different types of classification problems (binary, multiclass, etc.), but the type of classification problem you're working on doesn't actually matter that much when you're looking at how to construct an image augmentation pipeline. 

 

A typical pipeline would therefore include just picking the transformations you plan on using and applying those transformations to the images in our training set of images. The label connected to the artificially created image will be the same as the label connected to the original image. Basically, I can say that the label of the image persists through the transformations. 

I'm going to demonstrate how to use Albumentations together with PyTorch to build an image augmentation pipeline.

Before all else let's go ahead and import everything you need to create it:

# Import the libraries we will use

import albumentations
from torch.utils.data import Dataset
from PIL import Image
import numpy as np

After importing everything we need we can move on to creating a pipeline. In practice, you rarely apply just one transformation to an image. In most cases you apply multiple transformations to an image when you pass it through an image augmentation pipeline, so to start let's create a simple function that applies multiple transformations to an image. It will simulate the results of running an image through a pipeline. I can later demonstrate how to create a real pipeline with PyTorch.

 

How to Apply Multiple Transformations to Images

To apply multiple transformations to an image you use the Compose class. By using the Compose class you get a transform function. That transform function is what will be used for image augmentation. 

 

A typical example would be something like this:

# Create a pipeline of transformations

augmentation_pipeline = albumentations.Compose([
        albumentations.HorizontalFlip(p=0.5),
        albumentations.VerticalFlip(p=0.3),
        albumentations.RandomRain(p=1.0)
])

The pipeline above consists of three transformations: the HorizontalFlip transformation, the VerticalFlip transformation, and the RandomRain transformation. The way it works is pretty straightforward: when an image gets to it, it has a 50% chance of being horizontally flipped (because p=0.5 for HorizontalFlip), 30% chance to be vertically flipped (because p=0.3 for VerticalFlip) and a random rain effect will always be applied to it (because p=1.0 for RandomRain). Using such a pipeline is very easy. 

Let's create a function that will run an image through your pipeline: 

# Create a function for transforming images using multiple transformations

def multiple_augment(aug_pipeline, image):
       image_array = np.array(image)
       augmented_img = aug_pipeline(image=image_array)['image']
       return Image.fromarray(augmented_img)

To demonstrate the results of applying multiple transformations to an image I'll use the image of the golf ball I used in the previous articles of this series. 

First, I'll load that image:

# Load in the image of the golf ball and display it

golf_ball_image = Image.open('golf_ball_image.png') 
golf_ball_image

The original golf ball image looks like this:

The original image of a golf ball for the example of transformations in image augmentationsImage Source: Costa Del Sol Golf Club, https://www.cdsgolfclub.com/when-two-golf-balls-collide/

Now let's demonstrate what is the result of applying multiple transformations on that image, using the function I defined earlier.

To do that, I'll run the following code:

# Apply multiple transformations to the image

augmented_image = multiple_augment(augmentation_pipeline, golf_ball_image)
augmented_image	

The result of applying these transformations to my image will look like this:

Image of a golf ball with multiple transformations applied to the imageImage Source: Costa Del Sol Golf Club, https://www.cdsgolfclub.com/when-two-golf-balls-collide/

Take into consideration that, because I didn't define that all transformations have a 100% chance of being applied you might end up with an image that looks a bit different. If you want to make sure that you get the same result I got here just run the code a few times, or change the augmentation pipeline (assign the value 1.0 to all p values in the augmentation pipeline).

After demonstrating how to apply multiple transformations to an image I can move on to explaining how to create a pipeline using PyTorch.

 

Why You Should Use PyTorch to Create Image Augmentation Pipelines 

The PyTorch library already has a built-in package dedicated to performing image augmentation. However, that built-in package is a lot slower than the Albumentations library when it comes to performing image augmentation. Also, Albumentations is much more powerful in terms of the sheer number of different transformations that it allows the user to apply to an image. 

Before diving deep into how to create an image augmentation pipeline by combining PyTorch with Albumentations, I'll first go over how you feed data to PyTorch models. This is important because it is prerequisite knowledge for building an image augmentation pipeline. For those that don't know how you feed data to PyTorch models, this will be extremely useful, and for those that do already know how it's done, it can serve as a refresher.

Article continues below

 

What Are the Basics of Creating Datasets for PyTorch Models

In practice, it is a much better idea to separate the code used for creating a dataset that a neural network will use from the code that creates the neural network itself. PyTorch offers you two ways of preparing data for a model. You can do it either using the Dataset class or the DataLoader class, both of which can be imported from the torch.utils.data package of PyTorch. The Dataset class allows you to store your training samples with their corresponding labels, while the DataLoader class wraps an iterable around the Dataset class to enable easy access to the training samples you plan on using to train your model. 

Even though the process of using these classes to prepare data for a model might at first seem daunting, it is relatively simple if you understand how these classes work. In most cases, you use the Dataset class to prepare your data. I'll demonstrate how you can use the Dataset class to create your custom class that will allow you to not only load data but also apply transformations from Albumentations on the images it processes.

There are three functions a custom Dataset needs to implement to work properly:

 

  • __init__
  • __len__
  • __getitem__

Let's break down what each of these does, and at the same time build your custom Dataset class.

 

The __init__ Function

This function runs once when you instantiate the Dataset object. This function will initialize the directory that contains images, the file that contains our labels, and the transformations you will apply to your images.

 

The __len__ Function

This function returns the number of samples in your dataset.

 

The __getitem__ Function

This function returns a sample from the dataset, identified with a given index. Using that index this function will go to the location where an image is stored in memory and it will load it in while also retrieving its corresponding label. The function will also apply transformations on the image if we define that you want to use some. Finally, the function will return a tuple that consists of two members: the image itself and its corresponding label.

After defining how the aforementioned functions work, it is time to start building your custom class.

 

How to Build a Custom Dataset Class

Let's start building a custom Dataset class. I'll call it CustomDataset. To create it, I'm going to inherit from the Dataset class available in PyTorch. The __init__ and __len__ functions will be pretty much unchanged from how they are implemented in the Dataset class offered by PyTorch.

def __init__(self, file_paths, labels, transform=None):
    self.file_paths = file_paths
    self.labels = labels
    self.transform = transform
def __len__(self):
    return len(self.file_paths)

The only difference between this custom class and the original one lies in how is the __getitem__ function implemented. The first part of that function is pretty much the same as in the standard class, however, the second part is different because I don't want to use the built-in PyTorch transformations but the ones offered in Albumentations.

def __getitem__(self, idx):
    # Identical to what we have in the standard Dataset class
    # Everything else is new
    label = self.labels[idx]
    file_path = self.file_paths[idx]

    # Loads in the image using PIL
    image = Image.open(file_path)
   
    if self.transform:
        # Converts the image into a NumPy array
        image_np = np.array(image)

        # Applies transformations on the image
        augmented = self.transform(image=image_np)

        # Converts the NumPy array back to a PIL Image
        image = Image.fromarray(augmented['image'])

    return image, label

Now that my three functions are ready, I can combine them to create our CustomDataset class.

# Create a custom dataset class 

class CustomDataset(Dataset):
    def __init__(self, file_paths, labels, transform=None):
        self.file_paths = file_paths
        self.labels = labels
        self.transform = transform

    def __len__(self):
        return len(self.file_paths)


         def __getitem__(self, idx):

        # Identical to what we have in the standard Dataset class
        # Everything else is new
        label = self.labels[idx]
        file_path = self.file_paths[idx]

        # Loads in the image using PIL
        image = Image.open(file_path)
   
        if self.transform:
            # Converts the image into a NumPy array
            image_np = np.array(image)

            # Applies transformations on the image
            augmented = self.transform(image=image_np)

            # Converts the NumPy array back to a PIL Image
            image = Image.fromarray(augmented['image'])

        return image, label

Now that the class is ready, I can use it to augment images before passing them into my model. To demonstrate, I’m going to run only one image through this pipeline, the image of the golf ball, but the same concept applies to running multiple images. I’ll also assign a label to the golf ball image for the purposes of this demonstration: let's say that the label associated with the golf ball is 0

To run my images and apply the augmentation pipeline consisting of the three transformations I defined earlier in the article, I can use the following code:

# Create a custom dataset

custom_dataset = CustomDataset(
    file_paths=['golf_ball_image.png'],
    labels=[0],
    transform=augmentation_pipeline
)

The code above will create an object of the CustomDataset class. If you check the first member it produces, it is going to end up being a tuple that consists of the transformed image of our golf ball together with its label. 

So if I run the following code:

# Display tuple

custom_dataset[0]

you will end up with the following tuple:

(<PIL.Image.Image image mode=RGB size=624x416>, 0)

To get the image, I’ll just extract the first member of that tuple:

# Display image

custom_dataset[0][0]

 

The result I will get by running the code above is an image that looks like this:

Image of a golf ball with the custom_dataset appliedImage Source: Costa Del Sol Golf Club, https://www.cdsgolfclub.com/when-two-golf-balls-collide/

Again, there might be some slight differences in the final image because this image augmentation pipeline created with the Compose class doesn't apply all transformations to the image 100% of the time. If you want to try and get a different final result, just rerun the code that creates the custom dataset object and take a look at the resultant image:

# Create a custom dataset

custom_dataset = CustomDataset(
    file_paths=['golf_ball_image.png'],
    labels=[0],
    transform=augmentation_pipeline
)

# Display image

custom_dataset[0][0]

With this article, I’ll finish this series of articles on using Albumentations for image augmentation. In it, I described how to combine multiple transformations into a pipeline, and I later covered how to integrate that pipeline with PyTorch, one of the two most popular frameworks for creating and training Deep Learning models in Python.  

I explained how data is usually loaded in PyTorch models, and how you can modify the classes used for loading data so that you can use transformations from Albumentations instead of those built-in into PyTorch. The task I tackled in the article was image classification. Pipelines for object detection and more advanced tasks are a bit more complicated, but generally follow the same principles that were mentioned in this article. 

Boris Delovski

Data Science Trainer

Boris Delovski

Boris is a data science trainer and consultant who is passionate about sharing his knowledge with others.

Before Edlitera, Boris applied his skills in several industries, including neuroimaging and metallurgy, using data science and deep learning to analyze images.