Introduction to Image Augmentation

"Garbage in, garbage out" is a famous saying in the machine learning community. It means that, whenever you train any model, whether it be a deep learning model or just a statistical model, you always need to make sure that you are feeding it good data. Bad results are inevitable if you use the wrong data, even if you do pick the right model for the job. Many different data preprocessing techniques have been developed to make sure that even if we don't have access to ideal data, we can still get the most from what we have at our disposal.
By Boris Delovski • Updated on Mar 8, 2022

"Garbage in, garbage out" is a famous saying in the machine learning community. It means that, whenever you train any model, whether it be a deep learning model or just a statistical model, you always need to make sure that you are feeding it good data. Bad results are inevitable if you use the wrong data, even if you do pick the right model for the job. Many different data preprocessing techniques have been developed to make sure that even if we don't have access to ideal data, we can still get the most from what we have at our disposal. These data preprocessing techniques vary from task-to-task and are as important as the model we plan on using.

You will often hear people refer to the data preprocessing step as "image augmentation".  Image augmentation is what we do before we feed our image data into a computer vision model. In this article, we will talk about what image augmentation is, what are common image augmentation techniques, and how to implement them in Python.

This is the first article in a series of articles that will prepare you to integrate an image augmentation pipeline into your current pipeline to improve the results you get from your models


Why Use Image Augmentation

Data preprocessing will include various steps that modify our data before we use it for training our models. While image augmentation can't be considered data preprocessing if you look at it from that angle, it serves the same purpose to modify in some way the data we plan on training our model on. In the case of image augmentation, that means adding new, artificially created images to the dataset of images we plan to train our model on.

As everyone working in the field of machine learning knows, deep learning models are very "data hungry".  Researchers are constantly working on creating models that can be trained on small quantities of data, but even those small quantities of data are usually measured in the thousands. This often leads to one very simple problem: even if we have a quality model at our disposal, we don't have enough quality data to train it on. Many fields use computer vision and greatly suffer from the lack of data.

One good example of low quantity data affecting quality data is in the medical field.

Training models to solve some typical medical problems, such as segmenting tumors on a CT image, is very hard. For each image, the patient needs to give consent because each image is considered private data. Finding enough patients willing to let others look at their confidential information is problematic, and usually leads to researchers working with datasets that are lacking in terms of data quantity. Of course, this isn't a problem that solely plagues the field of medicine. Many other fields often find it hard to gather as much data as they need to create a dataset of high quality.

This lack of data can, to some degree, be remedied using image augmentation. Getting more real data is still preferred and will always be the best thing one can do when a dataset isn't big enough. In those cases where we can't do it in a reasonable time frame, we can use image augmentation. Image augmentation is so effective, people use image augmentation even when they do have high-quality datasets, because the same artificially created images that help us increase the accuracies when we train on smaller quantities of data help us further increase accuracy when we train on larger quantities of data.  

Nowadays, most research papers that cover topics on deep learning in computer vision introduce at least basic augmentation methods when training the model that the paper is presenting. This trend can be easily followed by looking at the most prominent computer vision deep learning models through history. AlexNet, Inception, ResNet, and many more have all included image augmentation techniques when training their models. The importance of image augmentation is so great that Google even created an algorithm called AutoAugment in 2018. AutoAugment’s sole purpose is to pick the best possible set of augmentations to use for a particular set of data. 


How Image Augmentation Works

Image augmentation is the process of creating artificial images from existing ones, that can be used as part of our training dataset. In other words, taking an original image from our dataset and changing it in some way. There are various changes we can introduce, but all of them will yield us the same result of an image that is good enough for our model to train on, yet different enough that it can't be considered a duplicate of the original image.

Though useful, the situation isn't quite as simple as it sounds. Creating artificial images and using them for training doesn't necessarily need to lead to better results. In fact, when used improperly, image augmentation can even decrease the accuracy of a network. However, there are guidelines that, if followed, will increase the odds of good results.

In most articles, you will find that image augmentation techniques are either not separated into categories at all, or are only separated into position and color augmentation techniques. Separating augmentation techniques this way is somewhat of an oversimplification. If we want to be precise, it is better to look at the process of creating the new image. Depending on how a transformation changes the original image to create a new image, we can separate the different transformations we use into:

Pixel-level transformations

• Spatial-level transformations

In this article, we will cover the simpler of the two types of transformations, pixel-level transformations. In a future article, we will cover spatial-level transformations, and how to build image augmentation pipelines.


The Albumentations Library

There are numerous ways of including an image augmentation pipeline in your machine learning project. Our library of choice will be Albumentations. While the easiest way is to use one of the libraries that were created for performing various tasks with images (e.g., PIL), Albumentations is the better choice. Albumentations not only allows us to transform images but also makes it very easy to create image augmentation pipelines (a topic we will cover more in-depth in the following article in this series). Other libraries, such as Torchvision, are also good choices, but are limited in their integration options. Torchvision integrates with PyTorch, while Albumentations can integrate with both Keras and PyTorch.

Albumentations is a library in Python specially designed to make doing image augmentation as easy as possible, being specifically designed for augmenting images. Its simple interface allows users to create pipelines that can effortlessly integrate into any existing Machine Learning pipeline.  Overall, Albumentations is better optimized than more general computer vision libraries.

Before we cover different transformations, let's install Albumentations. The easiest way to install Albumentations is using Anaconda or pip. If you want to install Albumentations using Anaconda, the code you need to run is:

conda install -c conda-forge albumentations

If you want to install Albumentations using pip, the code you need to run is:

pip install albumentations

If you plan on installing Albumentations from a Jupyter notebook, don't forget to add the exclamation sign:

!pip install albumentations

Pixel-Level Transformations

There are a plethora of different pixel-level transformations offered by Albumentations. Forty-five, to be exact. Of course, some of them are used more often and some are used less often. In this article, we will cover the most commonly used ones. If you are interested in transformations that are not mentioned here, I recommend taking a look at the Albumentations documentation.

The pixel-level transformations used most often are:

Blur and sharpen

• Histogram equalization and normalization

• Noise 

• Color manipulation


Blur and Sharpen

An important concept of image analysis and object identification in images is the concept of edges. These are places where we have rapid changes in pixel intensity. Blurring images is the process of "smoothening the edges." When we blur an image, we average out the rapid transitions by treating certain pixels as outliers. The process functions as if we were passing the image through a low-pass filter, which is usually used to remove noise from an image.

With Albumentations, we can both sharpen and blur our images. Mathematically speaking, what we are doing is selecting a kernel (often called a convolution matrix or a mask) and passing it over an image. This process is called convolution. Depending on which kernels we pass over our images, we get different results. Sharpening gets us the exact opposite effect, but it works much the same. We just pass a different kernel over our image.

Sharpening images is done using the Sharpen operation. Using this transformation, we highlight the edges and fine details present in an image by passing a kernel over it. Then, we overlay the result with the original image.  The enhanced image is the original image combined with the scaled version of the line structures and edges in that image.

Blurring images, on the other hand, is done using one of the following operations:


• AdvancedBlur

• GaussianBlur 

• MedianBlur 

It is worth mentioning that the blur you will probably use most often is GaussianBlur. The Blur transformation uses a random kernel for the operation, so the results you get might not be that great.

The GaussianBlur transformation works great because most of the time the noise that exists in an image will be similar to Gaussian noise. On the other hand, if salt-and-pepper noise appears in the image, using the MedianBlur transformation is a better tool.

The AdvancedBlur operation is theoretically the best possible solution if you have enough time to fully customize your blur transformation. It also uses a Gaussian filter but allows us to customize it in detail, so it best suits our needs. However, in most cases it is better to just stick with the standard GaussianBlur transformation or MedianBlur, depending on the situation, because time spent optimizing your blurring operation is probably better spent on optimizing your model instead. The standard GaussianBlur transformation is good enough in most cases.

Let's demonstrate the results we get by using the Sharpen, GaussianBlur, and MedianBlur operations on the following image of the Matsumoto Castle in Japan.

Image Source: Matsumoto Castle,

Before we apply any transformations, we need to import Albumentations and a few other standard image processing libraries:

import albumentations
import cv2
from PIL import Image
import numpy as np

To demonstrate the results of different transformations in a Jupyter Notebook, let's create a small function:

# Create function for transforming images

def augment_img(aug, image):
    image_array = np.array(image)
    augmented_img = aug(image=image_array)["image"]
    return Image.fromarray(augmented_img)

This function will return a transformed image. Take note that we must transform our image into an array before we apply the transformation to it. Once this is prepared, let's load our image, store it into a variable, and display it:

# Load in the castle image, convert into array, and display image

castle_image ="matsumoto_castle.jpg")

The displayed image is that of the castle shown earlier in this article. Now that everything is ready, we can apply the transformations to our image and take a look at the results:

Image Source: Matsumoto Castle,

Let's first sharpen our image: 

# Sharpen the image

sharpen_transformation = albumentations.Sharpen(p=1)
augment_img(sharpen_transformation, castle_image)

As you can see, we left all the parameters of the transformation at their default values except for one. The argument p defines the chance that the transformation will be applied to the image. A p value of 1 means that when we run the code there is a 100% chance that the transformation will be applied. 

While this might seem counterintuitive, it makes perfect sense once you see how pipelines work. You can define multiple transformations, define the chance each one will be applied, and then get random combinations of transformations to augment your images. This is something we will expand upon in future articles. The resulting image is:

Image Source: Matsumoto Castle,

If we want to apply the GaussianBlur transformation, we need to run the following code:

# Blur image: Gaussian

gauss_blur_transformation = albumentations.GaussianBlur(p=1)
augment_img(gauss_blur_transformation, castle_image)

The resulting image is going to look like this: 

Image Source: Matsumoto Castle,

And finally, if we want to apply the MedianBlur transformation, we need to run the following code:

# Blur image: Median

median_blur_transformation = albumentations.MedianBlur(p=1)
augment_img(median_blur_transformation, castle_image)

By applying this transformation, we will get the following result: 

Image Source: Matsumoto Castle,


Histogram Equalization and Normalization

Histogram equalization is a contrast adjustment method created to equalize the pixel intensity values in an image. The pixel intensity values usually range from 0 to 255. A grayscale image will have one histogram, while a color image will have three 2D histograms, one for each color: red, green, blue. 

On the histogram, the y-axis represents the frequency of pixels of a certain intensity. We improve the contrast of an image by stretching out the pixel intensity range of that image, which usually ends up increasing the global contrast of that image. This allows areas that are of lower contrast to become of higher contrast. 

An advanced version of this method exists called Adaptive Histogram Equalization. This is a modified version of the original method in which we create histograms for each part of an image. This allows us to enhance contrast optimally in each specific region of an image. 

Albumentations offers a few options to perform histogram equalization:


• HistogramMatching

• CLAHE(Contrast Limited Adaptive Histogram Equalization)

Of the three mentioned, you probably won't use HistogramMatching too often. It is usually used as a form of normalization because it takes an input image and tries to match its histogram to that of some reference image. It is used in very specific situations such as when you have two images of the same environment, just at two different times of the day. On the other hand, the Equalize transformation and the CLAHE transformation are used more frequently. 

The Equalize transformation is just a basic histogram equalization transformation. It is often overshadowed by CLAHE. CLAHE is a special type of adaptive histogram equalization. As a method, it better enhances contrast, but it does cause some noise to appear in the image. Nonetheless, the benefits often outweigh the detriments of using CLAHE, so it is very popular.

To better demonstrate how these methods work, we are going to convert our image into a grayscale image. We can do that using Albumentations, since it offers a transformation called ToGray:

# Grayscale image

grayscale_transformation = albumentations.ToGray(p=1)
grayscale_castle_image = augment_img(grayscale_transformation, castle_image)

The resulting image will look like this:

Image Source: Matsumoto Castle,

Once that's done, we can apply the two transformations. First, we will apply the standard histogram equalization method:

# Standard histogram equalization

histogram_equalization = albumentations.Equalize(p=1)
augment_img(histogram_equalization, grayscale_castle_image)

This is what the result of equalizing the histogram looks like: 

Image Source: Matsumoto Castle,

As you can see, the differences between the darker and lighter hues have been enhanced, which can be noticed especially on the roof of the castle. 

Now let's apply CLAHE:

# Standard histogram equalization

CLAHE_equalization = albumentations.CLAHE(p=1)
augment_img(CLAHE_equalization, grayscale_castle_image)

The resulting changes when we apply CLAHE: 

Image Source: Matsumoto Castle,

CLAHE enhances contrast much better locally. Look at the reflection of the entry to the palace. It is much more pronounced. This would help a model we are training to learn easier and quicker.

Normalization also modifies pixel intensity values and is also used in cases where images have poor contrast due to various reasons. You might be familiar with the term “dynamic range expansion,” which is what normalization is called in the field of digital signal processing.

Put in layman's terms and explained in an example above, normalization allows us to make sure that pixel values in images fall into a certain range. It is particularly useful when we need to make sure that all images in a particular set of data have pixels that follow a common statistical distribution. This is very important for Deep Learning models. When working with neural networks, we want to make sure that all the values that we input into a network fall into a certain range, which is why we normalize data before feeding it to the network. We won't go into detail right now since normalization is best demonstrated when we explain image augmentation pipelines.



Noise is something that is, to a certain degree, always present in an image. It is a byproduct of degradation that occurs when we take an image. When an image is taken, the digital signal gets "polluted" along the way, causing random variations in image brightness, and even sometimes in color information.

Even though it might seem counterproductive, sometimes we want to augment our images by adding noise to them on purpose. After all, our model will seldom get images that were taken in perfect conditions or that were previously cleaned. Therefore, teaching a model to recognize something in an image even if that image contains noise is productive, and something we should aim to do.

Albumentations allows us to easily implement:


• ISONoise

• MultiplicativeNoise

We mostly use Gaussian Noise, which is a statistical noise, with the same probability density function as that of the normal distribution. It is the noise that occurs in images during image acquisition or image signal transmission. In most situations, it accurately mimics what happens to images in real-life scenarios. To implement GaussNoise, you need to use the following code:

# Gaussian noise

gaussian_noise = albumentations.GaussNoise(var_limit=(350.0, 460.0), p=1)
augment_img(gaussian_noise, castle_image)

As a side note, I used large values for the var_limit argument to make the noise easier to see on the image. Default values create noise that a machine learning model easily recognizes, but that is not visible to the naked human eye.

The image we get by applying this transformation is: 

Image Source: Matsumoto Castle,


Color Manipulation

There are different ways of manipulating colors in an image. We already demonstrated one way earlier in this article, when we converted our original image to a grayscale image. That is a very common procedure that a lot of images go through before being fed into a model. If the color itself is not in any way connected to the problem the model is trying to solve, it is common practice to convert all images to grayscale. That’s because building networks that work with grayscale images (single channel images) is much easier than building networks that work with colored images (multi-channel images).

When we do work with images with color, we generally manipulate the hue, saturation, and brightness of a particular image. To perform color transformations in Albumentations we can use:


• ToSepia

• RandomBrightnessContrast

• HueSaturationValue

• ColorJitter

• FancyPCA

ToGray and ToSepia transformations are self-explanatory. ToGray will transform the image into a grayscale image, and ToSepia will apply a sepia filter to the RGB input image.

RandomBrightnessContrast is used very often. It is one of the most commonly used transformations, and not only amongst pixel-level transformations. It does exactly what the name says, randomly changing the contrast and brightness of the input image. Applying it to an image is done using the following code:

# Brightness and contrast

rand_brightness_contrast = albumentations.RandomBrightnessContrast(p=1)
augment_img(rand_brightness_contrast, castle_image)

The resulting image will look like this: 

Image Source: Matsumoto Castle,

Since RandomBrightnessContrast randomly selects values from a range, if you run the code above, your results might be a bit different. Even if the differences are not easy to recognize with the naked eye, models will still pick up on them. 

The HueSaturationValue transformation will randomly select values for hue, saturation, and value from a particular range of values. If we want to transform our images using this transformation, we can just run the following code:

# Random hue, saturation, and value

rand_hue_sat_val = albumentations.HueSaturationValue(hue_shift_limit=[50, 60], p=1)
augment_img(rand_hue_sat_val, castle_image)

In this case, I selected extreme values for hue on purpose to demonstrate the changes this transformation can make to the original image. The resulting image will look like this:

Image Source: Matsumoto Castle,

As you can see, the hue has been completely changed to the point where colors that weren't originally present in the image suddenly replace some previously existing colors.

The ColorJitter transformation will randomly change the values of the brightness, contrast, and saturation of our input image. To apply ColorJitter to an image we can use the following code:

# Random brightness, saturation, and contrast: ColorJitter

color_jit = albumentations.ColorJitter(p=1)
augment_img(color_jit, castle_image)

This code resulted in the following image:

Image Source: Matsumoto Castle,

Don't forget that, since the values are picked at random, if you run the same code, you might get different results. However, whatever you get will be easily distinguishable from the original image with the naked eye.

Finally, let’s go ahead and explain how the FancyPCA transformation works. The original name of this technique is PCA Color Augmentation. However, the name FancyPCA was adopted and even the libraries use that name.

FancyPCA is a technique that alters the intensities of RGB channels of an image. Essentially, it performs Principal Component Analysis on the different color channels of some input image. This ends up shifting red, green, and blue pixel values based on which values are most often present in the image. FancyPCA can be applied using the following code:

# PCA Color Augmentation

fancy_PCA = albumentations.FancyPCA(p=1)
augment_img(fancy_PCA, castle_image)

FancyPCA won't cause changes that humans can notice, but machine learning models will. 

For example, look at the image:

Image Source: Matsumoto Castle,

As you can see, the result is indistinguishable from the original image.


In this article, we covered the basics of image augmentation. We talked about what image augmentation is, why we use it, and we mentioned the two different types of image augmentation techniques that are often used to remedy the lack of data we often run into when working with images.

We also covered in-depth the first of the two mentioned types of image augmentation techniques, pixel-based transformations. Pixel-based transformations don't interact with the positions of elements in images and other spatial characteristics. Instead, this type of transformations focuses on manipulating values that represent each pixel to reduce the differences between neighboring pixels, increase those differences, add noise, or change color values.

The image augmentation techniques are simpler than spatial-based transformations. They are therefore less likely to cause negative effects on our model results even if we mess something up. Spatial transformations are far more dangerous and, if implemented incorrectly, can greatly decrease the accuracy of our models. 

In the following and last article in this series, we will cover spatial transformations. We will also demonstrate how easy it is to create a pipeline of transformations, including in an already existing machine learning pipeline.

Boris Delovski

Data Science Trainer

Boris Delovski

Boris is a data science trainer and consultant who is passionate about sharing his knowledge with others.

Before Edlitera, Boris applied his skills in several industries, including neuroimaging and metallurgy, using data science and deep learning to analyze images.