Adversarial Attack with FGSM (Fast Gradient Signed Method)
Adversarial Attack is the method to fool a neural network. This leads misclassification of a classification model. The FGSM attack is also known as white-box attack. In short, we need to know about th
# Define a function which preprocesss the original imagepreprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),])orig_img_tensor =preprocess(orig_img)# Prepend one dimension to the tensor for inferenceorig_img_batch = orig_img_tensor.unsqueeze(0)# Attach device to the image and the modelorig_img_batch = orig_img_batch.to(device)model = model.to(device)
4. Load ImageNet Classes
We use the ImageNet classes. The labels will be used for checking which label the original image and adversarial images are classfied by the model.
withopen("imagenet_classes.txt", "r")as f: labels = [s.strip()for s in f.readlines()]
5. Initial Prediction
Before creating adversarial examples, we need to know the classes and probabilities of the original image by the ResNet model.
pred =model(orig_img_batch)probs = F.softmax(pred[0], dim=0)probs_top5, idx_top5 = torch.topk(probs, 5)print("The top 5 labels of highly probabilies:")for i inrange(probs_top5.size(0)):print(f"{labels[idx_top5[i]]}: {probs_top5[i].item()*100:.2f}%")# Extract the top probability and index (target) for use in the next sectionstarget_prob = probs_top5[0]target_idx = idx_top5[0]
The top5 labels/accuracies should be such as below.
The top 5 labels of highly probabilies:
Samoyed: 87.33%
Pomeranian: 3.03%
white wolf: 1.97%
keeshond: 1.11%
Eskimo dog: 0.92%
As we imagine, the ResNet model predicted the original image as Samoyed with 87.33% accuracy.
6. Define Function to Denormalize
Create a function to denormalize an input image. Since the original image must be denormalized before FGSM process, this function is used to do that.
This process is the main role of the Adversarial Attack.
It calculates the sign of the backpropagated gradients. It will be used for adjusting the input data to maximize the loss value in the next section.
defcalc_perturbations(image,target): image.requires_grad =True# Predict the original image pred =model(image) loss = F.nll_loss(pred, target) model.zero_grad() loss.backward() gradient = image.grad.data signed_grad = gradient.sign()return signed_gradperturbations =calc_perturbations(orig_img_batch, torch.tensor([target_idx]))
8. Start Creating Adversarial Examples
Now generate adversarial exampels by each epsilon.
The adversarial image is generated by adding the multiply of epsilong and perturbations to the original image data.
Generally, the higher the value of epsilon, the less accuracy of the prediction by the model.
epsilons = [0,.01,.05,.1,.2]adv_examples = []for eps in epsilons: orig_img_batch_denorm =denorm(orig_img_batch) adv_img = orig_img_batch_denorm + eps * perturbations adv_img = torch.clamp(adv_img, 0, 1)# Normalize the adversarial image adv_img_norm = transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))(adv_img)# Predict the adversarial example adv_pred =model(adv_img_norm) adv_probs = F.softmax(adv_pred[0], dim=0) adv_probs_top5, adv_idx_top5 = torch.topk(adv_probs, 5)print("-"*28+f"Eps {eps}"+"-"*28)for i inrange(adv_probs_top5.size(0)):print(f"{labels[adv_idx_top5[i]]}: {adv_probs_top5[i]*100:.2f}%")print()# Make the adversarial example to the image to be saved adv_ex = adv_img.squeeze().detach().cpu().numpy() adv_examples.append((labels[adv_idx_top5[0]], adv_probs_top5[0], adv_ex))
The output should be such as below.
----------------------------Eps 0----------------------------
Samoyed: 87.33%
Pomeranian: 3.03%
white wolf: 1.97%
keeshond: 1.11%
Eskimo dog: 0.92%
----------------------------Eps 0.01----------------------------
West Highland white terrier: 43.36%
Scotch terrier: 8.47%
wallaby: 7.29%
cairn: 4.53%
Angora: 1.87%
----------------------------Eps 0.05----------------------------
West Highland white terrier: 92.15%
cairn: 1.28%
Angora: 1.16%
Scotch terrier: 1.06%
Maltese dog: 0.66%
----------------------------Eps 0.1----------------------------
West Highland white terrier: 97.47%
Scotch terrier: 0.57%
cairn: 0.31%
Angora: 0.17%
Maltese dog: 0.15%
----------------------------Eps 0.2----------------------------
West Highland white terrier: 50.01%
white wolf: 12.23%
ice bear: 8.72%
Arctic fox: 3.96%
Samoyed: 2.19%
We should notice that adversarial images were not classified as Samoyed, but the other labels such as West Highland white terrier after the epsilon 0.01.
In short, we succeeded to fool the model’s predictions by modifying the original image.
9. Plot the Result
Although this section is optional, we can plot the result above.
We create functions to preprocess image and get label at first.
# Helper function to preprocess the image so that it can be inputted in MobileNetV2defpreprocess(image): image = tf.cast(image, tf.float32) image = tf.image.resize(image, (224, 224)) image = tf.keras.applications.mobilenet_v2.preprocess_input(image) image = image[None, ...]return image# Helper function to extract labels from probability vectordefget_imagenet_label(probs):returndecode_predictions(probs, top=1)[0][0]
From this, we create the adversarial image to fool the MobileNetV2 model. The following code creates the perturbations to modify the original image.
# Instantiate a function that computes the crossentropy loss between labels and predictions.loss_obj = tf.keras.losses.CategoricalCrossentropy()defcreate_adversarial_pattern(input_image,input_label):# The gradient tape records the operations which are executed inside it.with tf.GradientTape()as tape: tape.watch(input_image) prediction =pretrained_model(input_image) loss =loss_obj(input_label, prediction)# Get the gradients of the loss w.r.t (with respect to) to the input image. gradient = tape.gradient(loss, input_image)# Get the sign of the gradients to create the perturbation. signed_grad = tf.sign(gradient)return signed_grad# The index of the label for labrador retrievertarget_label_idx =208orig_label = tf.one_hot(target_label_idx, orig_image_probs.shape[-1])orig_label = tf.reshape(orig_label, (1, orig_image_probs.shape[-1]))perturbations =create_adversarial_pattern(orig_image, orig_label)
Now create adversarial examples and predict the labels by the classification model while increasing epsilon.
# Epsilons are error terms (very small numbers)epsilons = [0,0.01,0.1,0.15]for i, eps inenumerate(epsilons): adv_image = orig_image + eps*perturbations adv_image = tf.clip_by_value(adv_image, -1, 1)# Predict the label and the confidence for the adversarial image _, label, confidence =get_imagenet_label(pretrained_model.predict(adv_image))print(f"predicted label: {label}")print(f"confidence: {confidence*100:.2f}%")print("-"*128)
As above, the adversarial examples were predicted as different labels from the label that the original image was predicted (the original label is labrador retriever).
To display the final adversarial image, execute the following code.
import matplotlib.pyplot as pltplt.imshow(adv_image[0])
4. Save/Load the Adversarial Image
We can save the generated adversarial image as below.
tf.keras.utils.save_img("fake.png", adv_image[0])
To load this image, use Pillow.
from PIL import Imagefake_img = Image.open("fake.png")fake_img