Adversarial Attack with FGSM (Fast Gradient Signed Method)

Adversarial Attack is the method to fool a neural network. This leads misclassification of a classification model. The FGSM attack is also known as white-box attack. In short, we need to know about th

Create Adversarial Examples against ResNet

Reference: PyTorch Docs

It's recommended to use an environment which is optimized to implement a machine learning model such as Google Colaboratory, Jupyter Notebook.

1. Import Modules

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, models, transforms
import numpy as np
from PIL import Image

2. Load ResNet Model

We load the ResNet50 pretrained on ImageNet. It's no problem whether ResNet18, ResNet34, etc.

model = models.resnet50(pretrained=True)
model.eval()

torch.manual_seed(42)
use_cuda = True
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device: ", device)

3. Load/Preprocess Image

We use the image of the fluffy samoyed dog.

wget https://github.com/pytorch/hub/raw/master/images/dog.jpg

Then need to preprocess it.

# Define a function which preprocesss the original image
preprocess = transforms.Compose([
  transforms.Resize(256),
  transforms.CenterCrop(224),
  transforms.ToTensor(),
  transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

orig_img_tensor = preprocess(orig_img)

# Prepend one dimension to the tensor for inference
orig_img_batch = orig_img_tensor.unsqueeze(0)

# Attach device to the image and the model
orig_img_batch = orig_img_batch.to(device)
model = model.to(device)

4. Load ImageNet Classes

We use the ImageNet classes. The labels will be used for checking which label the original image and adversarial images are classfied by the model.

wget https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

Then read this text file and assign to labels.

with open("imagenet_classes.txt", "r") as f:
  labels = [s.strip() for s in f.readlines()]

5. Initial Prediction

Before creating adversarial examples, we need to know the classes and probabilities of the original image by the ResNet model.

pred = model(orig_img_batch)
probs = F.softmax(pred[0], dim=0)
probs_top5, idx_top5 = torch.topk(probs, 5)
print("The top 5 labels of highly probabilies:")
for i in range(probs_top5.size(0)):
  print(f"{labels[idx_top5[i]]}: {probs_top5[i].item()*100:.2f}%")

# Extract the top probability and index (target) for use in the next sections
target_prob = probs_top5[0]
target_idx = idx_top5[0]

The top5 labels/accuracies should be such as below.

The top 5 labels of highly probabilies:
Samoyed: 87.33%
Pomeranian: 3.03%
white wolf: 1.97%
keeshond: 1.11%
Eskimo dog: 0.92%

As we imagine, the ResNet model predicted the original image as Samoyed with 87.33% accuracy.

6. Define Function to Denormalize

Create a function to denormalize an input image. Since the original image must be denormalized before FGSM process, this function is used to do that.

def denorm(batch, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
  if isinstance(mean, list):
    mean = torch.tensor(mean).to(device)
  if isinstance(std, list):
    std = torch.tensor(std).to(device)
  return batch * std.view(1, -1, 1, 1) + mean.view(1, -1, 1, 1)

7. Calculate Perturbations

This process is the main role of the Adversarial Attack. It calculates the sign of the backpropagated gradients. It will be used for adjusting the input data to maximize the loss value in the next section.

def calc_perturbations(image, target):
  image.requires_grad = True

  # Predict the original image
  pred = model(image)

  loss = F.nll_loss(pred, target)
  model.zero_grad()
  loss.backward()

  gradient = image.grad.data
  signed_grad = gradient.sign()
  return signed_grad

perturbations = calc_perturbations(orig_img_batch, torch.tensor([target_idx]))

8. Start Creating Adversarial Examples

Now generate adversarial exampels by each epsilon. The adversarial image is generated by adding the multiply of epsilong and perturbations to the original image data. Generally, the higher the value of epsilon, the less accuracy of the prediction by the model.

epsilons = [0, .01, .05, .1, .2]

adv_examples = []

for eps in epsilons:
  orig_img_batch_denorm = denorm(orig_img_batch)
  adv_img = orig_img_batch_denorm + eps * perturbations
  adv_img = torch.clamp(adv_img, 0, 1)

  # Normalize the adversarial image
  adv_img_norm = transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))(adv_img)

  # Predict the adversarial example
  adv_pred = model(adv_img_norm)
  adv_probs = F.softmax(adv_pred[0], dim=0)
  adv_probs_top5, adv_idx_top5 = torch.topk(adv_probs, 5)
  print("-"*28 + f"Eps {eps}" + "-"*28)
  for i in range(adv_probs_top5.size(0)):
    print(f"{labels[adv_idx_top5[i]]}: {adv_probs_top5[i]*100:.2f}%")
  print()

  # Make the adversarial example to the image to be saved
  adv_ex = adv_img.squeeze().detach().cpu().numpy()

  adv_examples.append((labels[adv_idx_top5[0]], adv_probs_top5[0], adv_ex))

The output should be such as below.

----------------------------Eps 0----------------------------
Samoyed: 87.33%
Pomeranian: 3.03%
white wolf: 1.97%
keeshond: 1.11%
Eskimo dog: 0.92%

----------------------------Eps 0.01----------------------------
West Highland white terrier: 43.36%
Scotch terrier: 8.47%
wallaby: 7.29%
cairn: 4.53%
Angora: 1.87%

----------------------------Eps 0.05----------------------------
West Highland white terrier: 92.15%
cairn: 1.28%
Angora: 1.16%
Scotch terrier: 1.06%
Maltese dog: 0.66%

----------------------------Eps 0.1----------------------------
West Highland white terrier: 97.47%
Scotch terrier: 0.57%
cairn: 0.31%
Angora: 0.17%
Maltese dog: 0.15%

----------------------------Eps 0.2----------------------------
West Highland white terrier: 50.01%
white wolf: 12.23%
ice bear: 8.72%
Arctic fox: 3.96%
Samoyed: 2.19%

We should notice that adversarial images were not classified as Samoyed, but the other labels such as West Highland white terrier after the epsilon 0.01.

In short, we succeeded to fool the model’s predictions by modifying the original image.

9. Plot the Result

Although this section is optional, we can plot the result above.

import matplotlib.pyplot as plt

cnt = 0
plt.figure(figsize=(28, 10))

for i, eps in enumerate(epsilons):
  cnt += 1
  plt.subplot(1, len(adv_examples), cnt)
  plt.xticks([])
  plt.yticks([])
  label, prob, img = adv_examples[i]
  plt.title(f"Eps {eps}\nClass: {label}\nAccuracy: {prob*100:.2f}%", fontsize=14)
  plt.imshow(img.T)
plt.show()

We should see that the noise gets louder as the epsilon increases. However, from human eyes, these images are Samoyed no matter how you look at them.

10. Save the Adversarial Examples

Finally, we save the generated adversarial images. Create new folder to store all adversarial images to be downloaded.

mkdir fake_dogs

Now save the images. We can use them to fool ResNet models.

# Save adversarial images
from torchvision.utils import save_image

for i, eps in enumerate(epsilons):
  label, prob, ex = adv_examples[i]
  ex_tensor = torch.from_numpy(ex).clone()
  save_image(ex_tensor, f"fake_dogs/fake_dog_eps{eps}.png")

Create Adversarial Examples against MobileNetV2

Reference: TensorFlow Docs

1. Load Pretrained Model (MobileNetV2)

import tensorflow as tf

pretrained_model = tf.keras.applications.MobileNetV2(include_top=True, weights='imagenet')
pretrained_model.trainable = False

# ImageNet labels
decode_predictions = tf.keras.applications.mobilenet_v2.decode_predictions

2. Prepare Original Image

We create functions to preprocess image and get label at first.

# Helper function to preprocess the image so that it can be inputted in MobileNetV2
def preprocess(image):
  image = tf.cast(image, tf.float32)
  image = tf.image.resize(image, (224, 224))
  image = tf.keras.applications.mobilenet_v2.preprocess_input(image)
  image = image[None, ...]
  return image

# Helper function to extract labels from probability vector
def get_imagenet_label(probs):
	return decode_predictions(probs, top=1)[0][0]

Then load the original image and preprocess it.

orig_image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
orig_image_raw = tf.io.read_file(image_path)
orig_image = tf.image.decode_image(image_raw)

orig_image = preprocess(image)
orig_image_probs = pretrained_model.predict(image)

To get the label of the image that the model predicted, execute the following code.

_, orig_image_class, orig_class_confidence = get_imagenet_label(orig_image_probs)

print(f"class: {orig_image_class}")
print(f"confidence: {orig_class_confidence}")

# The output
# class: Labrador_retriever
# confidence: 0.418184757232666

3. Create Adversarial Image with FGSM

From this, we create the adversarial image to fool the MobileNetV2 model. The following code creates the perturbations to modify the original image.

# Instantiate a function that computes the crossentropy loss between labels and predictions.
loss_obj = tf.keras.losses.CategoricalCrossentropy()

def create_adversarial_pattern(input_image, input_label):
	# The gradient tape records the operations which are executed inside it.
  with tf.GradientTape() as tape:
    tape.watch(input_image)
    prediction = pretrained_model(input_image)
    loss = loss_obj(input_label, prediction)

  # Get the gradients of the loss w.r.t (with respect to) to the input image.
  gradient = tape.gradient(loss, input_image)
  # Get the sign of the gradients to create the perturbation.
  signed_grad = tf.sign(gradient)
  return signed_grad

# The index of the label for labrador retriever
target_label_idx = 208
orig_label = tf.one_hot(target_label_idx, orig_image_probs.shape[-1])
orig_label = tf.reshape(orig_label, (1, orig_image_probs.shape[-1]))

perturbations = create_adversarial_pattern(orig_image, orig_label)

Now create adversarial examples and predict the labels by the classification model while increasing epsilon.

# Epsilons are error terms (very small numbers)
epsilons = [0, 0.01, 0.1, 0.15]

for i, eps in enumerate(epsilons):
	adv_image = orig_image + eps*perturbations
	adv_image = tf.clip_by_value(adv_image, -1, 1)
	# Predict the label and the confidence for the adversarial image
	_, label, confidence = get_imagenet_label(pretrained_model.predict(adv_image))
	print(f"predicted label: {label}")
	print(f"confidence: {confidence*100:.2f}%")
	print("-"*128)

The outputs are something like below.

1/1 [==============================] - 0s 25ms/step
predicted label: Labrador_retriever
confidence: 41.82%
--------------------------------------------------------------------------------------------------------------------------------
1/1 [==============================] - 0s 27ms/step
predicted label: Saluki
confidence: 13.08%
--------------------------------------------------------------------------------------------------------------------------------
1/1 [==============================] - 0s 24ms/step
predicted label: Weimaraner
confidence: 15.13%
--------------------------------------------------------------------------------------------------------------------------------
1/1 [==============================] - 0s 26ms/step
predicted label: Weimaraner
confidence: 16.58%
--------------------------------------------------------------------------------------------------------------------------------

As above, the adversarial examples were predicted as different labels from the label that the original image was predicted (the original label is labrador retriever). To display the final adversarial image, execute the following code.

import matplotlib.pyplot as plt

plt.imshow(adv_image[0])

4. Save/Load the Adversarial Image

We can save the generated adversarial image as below.

tf.keras.utils.save_img("fake.png", adv_image[0])

To load this image, use Pillow.

from PIL import Image

fake_img = Image.open("fake.png")
fake_img

Last updated