July 12, 2024

[ad_1]

Understand what is image data augmentation and how to use it using Keras for your deep learning projects

Photo by Chris Lawton on Unsplash

If you have ever tried performing image recognition using deep learning, you know the importance of a good dataset for training. However, finding sufficient images for training is not always easy, and the accuracy of your model is directly dependent of the quality of the training data.

Fortunately, there are techniques you can use to supplement the images dataset that they use for training. One of the techniques is called image data augmentation. In this article, I will discuss what is image data augmentation, how it works, why is it useful in deep learning, and finally how to perform image data augmentation using the Keras library.

Image data augmentation is a technique that creates new images from existing ones. To do that, you make some small changes to them, such as adjusting the brightness of the image, or rotating the image, or shifting the subject in the image horizontally or vertically.

Image augmentation techniques allow you to artificially increase the size of your training set, thereby providing much more data to your model for training. This allows you to improve the accuracy of your model by enhancing the ability of your model to recognize new variants of your training data.

Types of Image Data Augmentation

Image augmentation comes in many forms, here are some of the common ones — Vertical shift, Horizontal shift, Vertical flip, Horizontal flip, Rotation, Brightness adjustment, and Zoom In/Out.

I will first demonstrate the various image augmentation techniques using Python and Keras. If you want to try along, make sure you have the following software and packages installed:

Once Anaconda and TensorFlow are installed, create a new Jupyter Notebook.

Vertical Shift

The first image augmentation technique that I want to show is the vertical shift. The vertical shift randomly shifts the image vertically up or down. For this example, I am going to use an image named 747.jpg, located in the same folder as my Jupyter Notebook.

Image source: https://commons.wikimedia.org/wiki/File:Qantas_Boeing_747-438ER_VH-OEI_at_LAX.jpg. This file is licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license.

The following code snippet uses the ImageDataGenerator class in Keras to vertically shift the image.

The ImageDataGenerator class from Keras generates batches of image data with real-time data augmentation.

#---import the modules---
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator
#---load the image---
image_filename="747.jpg"
img = load_img(image_filename)
#---convert the image to 3D array---
image_data = img_to_array(img)
#---convert into a 4-D array of 1 element of 3D array representing
# the image---
images_data = np.expand_dims(image_data, axis=0)
#---create image data augmentation generator---
datagen = ImageDataGenerator(width_shift_range=0.2)
#---prepare the iterator; flow() takes in a 4D array and returns
# an iterator containing a batch of images---
train_generator = datagen.flow(images_data, batch_size=1)
rows = 5
columns = 4
#---plot 5 rows and 4 columns---
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
#---get the next image in the batch (one image since batch
# size is 1)---
image_batch = train_generator.next()

#---convert to unsigned integers for viewing---
image = image_batch[0].astype('uint8')

#---show the image---
axes[r,c].imshow(image)
#---set the size of the figure---
fig.set_size_inches(15,10)

The above code snippet produces the following output:

As you can see from the output above, each time you call the next() method from the train_generator object, you get an image that is slightly altered. In the above code snippet, a new image that is shifted by 20% based on its original image height is returned everytime you call the next() method:

datagen = ImageDataGenerator(width_shift_range=0.2)

Interestingly, for this version of the ImageDataGenerator (tensorflow.keras.preprocessing.image) class, specifying the width_shift_range parameter shifts the image vertically, instead of horizontally (which is the behaviour of the olderImageDataGeneratorfrom the keras.preprocessing.image module). Likewise, if you want the image to be shifted horizontally, you need to use the height_shift_range parameter (see next section).

Note that the next() method will return an augmented image for as many times as you want. In the code snippet above, we called it 20 times (5 rows times 4 columns).

Horizontal Shift

You can now try to shift the image horizontally using the height_shift_range parameter:

datagen = ImageDataGenerator(height_shift_range=0.2)
train_generator = datagen.flow(images_data, batch_size=1)
rows = 5
columns = 4
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

The above code snippet produces the following output:

Horizontal Flip

Sometimes it makes sense to flip the image horizontally. In the case of an airplane, the front of the plane may be facing the left, or the right:

datagen = ImageDataGenerator(horizontal_flip=True)
train_generator = datagen.flow(images_data, batch_size=1)
rows = 2
columns = 2
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

For the code snippet above, generating four images is good enough as the front of the airplane may either be facing left or right:

Remember that the flipping is random (sometimes you get all four original images and sometimes you get images that are flipped horizontally). It is likely that four images above may all be the same. If that happens, just run that block of code again.

Vertical Flip

Just like horizontal flipping, you may also perform vertical flipping:

datagen = ImageDataGenerator(vertical_flip=True)
train_generator = datagen.flow(images_data, batch_size=1)
rows = 2
columns = 2
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

In the case of airplanes, it might not make a lot of sense to flip our airplane upside down! If you are trying to perform image recognition, chances are your images of planes are upright, and so training your model to recognize upside down planes may not be too common. For other cases, vertical flipping makes a lot of sense.

Rotation

Rotation, as the name implies, rotates your image. This is would be very useful for our airplane image. The following code snippets randomly rotates the image up to 50 degrees:

datagen = ImageDataGenerator(rotation_range=50)
train_generator = datagen.flow(images_data)
rows = 5
columns = 4
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

With rotation, the output shows the airplanes in the various positions — simulating the take off and landing positions:

Brightness

Another augmentation technique is adjusting the brightness of the image. The following code snippet sets a range of brightness shift values:

datagen = ImageDataGenerator(brightness_range=[0.15,2.0])
train_generator = datagen.flow(images_data, batch_size=1)
rows = 5
columns = 4
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

The output contains a series of images of varying brightness:

Zooming

You can also zoom the images in or out:

datagen = ImageDataGenerator(zoom_range=[5,0.5])
train_generator = datagen.flow(images_data, batch_size=1)
rows = 5
columns = 4
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

The output shows the image in the various zoom ratios:

Note that zooming the images will alter the aspect ratios of the images.

Combining all the augmentations

Of course, all the various augmentation techniques that I have discussed so far can be combined:

datagen = ImageDataGenerator(width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
rotation_range=50,
brightness_range=[0.15,2.0],
zoom_range=[5,0.5]
)
train_generator = datagen.flow(images_data, batch_size=1)rows = 8
columns = 8
fig, axes = plt.subplots(rows,columns)
for r in range(rows):
for c in range(columns):
image_batch = train_generator.next()
image = image_batch[0].astype('uint8')
axes[r,c].imshow(image)
fig.set_size_inches(15,10)

Note that I have left out vertical flip as it does not make sense for our example.

The output now shows the image with the various augmentations applied:

The previous sections showed the basics of image data augmentation and how it can be applied to a single image. In deep learning, we often deal with a set of images. So let’s now see how image augmentation can be applied to a set of images. For illustrations, I am going to assume that in the folder that contains your Jupyter Notebook, you have a Fruits folder and the following subfolders:

Fruits
|__banana
|__banana1.jpg
|__banana2.jpg
|__banana3.jpg
|__ ...
|__durian
|__durian1.jpg
|__durian2.jpg
|__durian3.jpg
|__ ...
|__orange
|__orange1.jpg
|__orange2.jpg
|__orange3.jpg
|__ ...
|__strawberry
|__strawberry1.jpg
|__strawberry2.jpg
|__strawberry3.jpg
|__ ...

Each subfolder contains a set of images. For example, the banana folder contains a number of images named banana1.jpg, banana2.jpg, and so on. The name of the subfolders will serve as the labels for the various images. This means that all the files under the banana folder contains images of banana, and so on.

To load a series of images from disk, you now call the flow_from_directory() method of the ImageDataGenerator instance instead of the flow() method (for loading images from memory):

train_datagen = ImageDataGenerator(
horizontal_flip=True,
vertical_flip=True,
rotation_range=50,
)
batch_size = 8train_generator = train_datagen.flow_from_directory(
'./Fruits',
target_size=(224,224),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=True)

Observe that I now set the batch_size to 8. You will see the use of the batch size shortly.

Using the iterator returned, I can find the labels for the various fruits (banana, durian, orange, and strawberry):

class_dictionary = train_generator.class_indices#---create a dictionary of labels---
class_dictionary = { value:key for key,value in
class_dictionary.items()}
#---convert the dictionary to a list---
class_list = [value for _,value in class_dictionary.items()]
print(class_list)

You will see the following output:

Found 54 images belonging to 4 classes.
['banana', 'durian', 'orange', 'strawberry']

In all, there are a total of 54 images in 4 folders. Also, the class_list variable contains the list of fruits.

I shall now print out the set of augmented images that are created by the ImageDataGenerator class. I will arbitrarily set the rows to 10, and for each row, I want to print out the batch of images returned (which is 8 in this example):

rows = 10fig, axes = plt.subplots(rows,batch_size)for r in range(rows):    
#---get the batch of augmented images---
image_batch = train_generator.next()
#---get the number of images returned---
images_count = image_batch[0].shape[0]

for c in range(images_count):
#---convert to unsigned integers for viewing---
image = image_batch[0][c].astype('uint8')

#---display the image---
axes[r,c].imshow(image)

#---display the label of the image---
axes[r,c].title.set_text(
class_list[np.argmax(image_batch[1][c])])
#---hides the x and y-ticks---
axes[r,c].set_xticks([])
axes[r,c].set_yticks([])
fig.set_size_inches(15,18)

Since the batch_size is now set to 8 (and no longer 1), the train_generator.next() method will return you a batch of eight augmented images every time you call it. The number of images returned is based on batch_size that you set earlier in the flow_from_directory() method:

train_generator = train_datagen.flow_from_directory(
'./Fruits',
target_size=(224,224),
color_mode="rgb",
batch_size=batch_size, # batch_size = 8
class_mode="categorical",
shuffle=True)

The value of the image_batch variable (returned by the next() method) is a tuple of two elements:

  • The first element (image_batch[0]) is a array of batch_size images (4D array)
  • The second element (image_batch[1]) contains the labels for the images

The above code snippet produces the following output:

Notice that on the seventh row, there are two empty charts with no images. Recall that there are a total of 54 images in the images set, and since each batch returns 8 images (per row), the first seven rows will display a total of 54 images (8×6 + 6). The following figure makes it clear:

Note that you can set rows to any number and the ImageDataGenerator class will keep on generating new augmented images for you.

Building a model using transfer learning

You now know how to use the ImageDataGenerator to load sets of images from disk for augmentation. But how do you use it for training? The following code snippet shows how to build a deep learning model using transfer learning.

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. Transfer learning reduces the amount of time that you need to spend on training.

from tensorflow.keras.models import Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
#---number of fruits---
NO_CLASSES = max(train_generator.class_indices.values()) + 1
#---load the VGG16 model as the base model for training---
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))
#---add our own layers---
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024,activation='relu')(x) # add dense layers so
# that the model can
# learn more complex
# functions and
# classify for better
# results.
x = Dense(1024,activation='relu')(x) # dense layer 2
x = Dense(512,activation='relu')(x) # dense layer 3
preds = Dense(NO_CLASSES,
activation='softmax')(x) # final layer with
# softmax activation
#---create a new model with the base model's original
# input and the new model's output---
model = Model(inputs = base_model.input, outputs = preds)
#---don't train the first 19 layers - 0..18---
for layer in model.layers[:19]:
layer.trainable=False
#---train the rest of the layers - 19 onwards---
for layer in model.layers[19:]:
layer.trainable=True

#---compile the model---
model.compile(optimizer="Adam",
loss="categorical_crossentropy",
metrics=['accuracy'])

Explaining how transfer learning works is beyond the scope of this article. I will leave it for another article.

Using the generated images for training

To use the augmented images for training, pass the train_generator into the fit() method of the model:

#---train the model---
step_size_train = train_generator.n // train_generator.batch_size
model.fit(train_generator,
steps_per_epoch=step_size_train,
epochs=15)

The steps_per_epoch parameter basically means how many steps are there in an epoch — it is dependent on the number of images you have and the batch size defined earlier. If you set this to a high number, then you are doing repetitve training. You should set it based on this formula:

number of images / batch size

In our example, we have 54 images in total. And so in each epoch, the ImageDataGenerator class will transform all the 54 images for training. In each epoch the model will get different variations of the images. If you have 15 epoch, then in total 15×54 variations of the images will be generated and fed into the training model.

The ImageDataGenerator class allows your model to receive new variations of the images at each epoch. But do remember that it only returns the transformed images and does not add it to the set of images that you have.

I hope this article have given you a good idea of what image data augmentation is all about and why you need them in the training of your deep learning models. In particular, I have demonstrated it using the Keras module in the TensorFlow library.

Image Data Augmentation for Deep Learning Republished from Source https://towardsdatascience.com/image-data-augmentation-for-deep-learning-77a87fabd2bf?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>

[ad_2]

Source link