TensorFlow 2.0 : 上級 Tutorials : 生成 :- FGSM を使用した敵対的サンプル (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 11/23/2019

* 本ページは、TensorFlow org サイトの TF 2.0 – Advanced Tutorials – Generative の以下のページを翻訳した上で
適宜、補足説明したものです：

Adversarial example using FGSM

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

★ 無料セミナー開催中 ★ クラスキャット主催人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社公式 Web サイトでご確認頂けます。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/

Facebook: https://www.facebook.com/ClassCatJP/

生成 :- FGSM を使用した敵対的サンプル

このチュートリアルは Goodfellow et al による Explaining and Harnessing Adversarial Examples で記述されているように Fast Gradient Signed メソッド (FGSM) 攻撃を使用して敵対的サンプルを作成します。これはニューラルネットワークを騙す最初の最もポピュラーな攻撃の一つでした。

敵対的サンプルとは何でしょう？

敵対的サンプルはニューラルネットワークを混乱させ、与えられた入力の誤分類という結果になる目的で作成された特殊化された (= specialised) 入力です。これらの悪名高い入力は人間の目には見分けがつきませんが、ネットワークに画像の内容を識別することを失敗させます。そのような攻撃の幾つかのタイプがありますが、ここでは焦点は fast gradient sign メソッド攻撃にあります、これはホワイトボックス攻撃でそのゴールは誤分類を確実にすることです。ホワイトボックス攻撃では攻撃者は攻撃されるモデルへの完全なアクセスを持ちます。下で示される敵対的画像の最も有名な例の一つは前述のペーパーから取られました。

ここで、パンダの画像から始め、攻撃者は元の画像に小さな摂動 (= perturbations) (歪み (= distortion)) を追加します、これはモデルがこの画像をテナガザルとして、高い信頼度でラベル付けする結果になります。これらの摂動を追加する過程は下で説明されます。

Fast gradient sign メソッド

fast gradient sign メソッドは敵対的サンプルを作成するニューラルネットワークの勾配を利用することにより動作します。入力画像に対して、メソッドは損失を最大化する新しい画像を作成する入力画像に関する損失の勾配を使用します。この新しい画像は敵対的画像と呼ばれます。これは次の式を使用して要約できます :

$adv\_x = x + \epsilon*\text{sign}(\nabla_xJ(\theta, x, y))$

ここで

adv_x : 敵対的画像。
x : 元の入力画像。
y : 元の入力ラベル。
$\epsilon$ : 摂動が小さいことを確実にする乗数。
$\theta$ : モデルパラメータ。
$J$ : 損失。

ここで面白い特質は、勾配は入力画像に関して取られることです。これが成されるのは目的が損失を最大化する画像を作成することだからです。これを達成する方法は画像の各ピクセルが損失値にどのくらい寄与するかを見出し、それに従って摂動を追加することです。これはかなり高速に動作します、何故ならば連鎖率を使用してそして必要な勾配を見つけることにより、各ピクセルが損失にどのくらい寄与するかを見い出すことは容易だからです。こうして、画像に関する勾配が使用されます。加えて、モデルはもはや訓練されませんので (そのため勾配は訓練可能な変数, i.e. モデルパラメータに関して取られません)、モデルパラメータは定数で在り続けます。唯一のゴールは既に訓練されたモデルを騙すことです。

それでは事前訓練されたモデルを試して騙してみましょう。このチュートリアルでは、モデルは MobileNetV2 で、ImageNet 上で事前訓練されました。

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.rcParams['figure.figsize'] = (8, 8)
mpl.rcParams['axes.grid'] = False

事前訓練された MobileNetV2 モデルと ImageNet クラス名をロードしましょう。

pretrained_model = tf.keras.applications.MobileNetV2(include_top=True,
                                                     weights='imagenet')
pretrained_model.trainable = False

# ImageNet labels
decode_predictions = tf.keras.applications.mobilenet_v2.decode_predictions

Downloading data from https://github.com/JonathanCMitchell/mobilenet_v2_keras/releases/download/v1.1/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
14540800/14536120 [==============================] - 1s 0us/step

# Helper function to preprocess the image so that it can be inputted in MobileNetV2
def preprocess(image):
  image = tf.cast(image, tf.float32)
  image = image/255
  image = tf.image.resize(image, (224, 224))
  image = image[None, ...]
  return image

# Helper function to extract labels from probability vector
def get_imagenet_label(probs):
  return decode_predictions(probs, top=1)[0][0]

元の画像

Wikimedia Common からラブラドール・レトリバー by Mirko CC-BY-SA 3.0 のサンプル画像を使用してそして敵対的サンプルを作成します。最初のステップはそれを MobileNetV2 モデルに入力として供給できるように前処理することです。

image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
image_raw = tf.io.read_file(image_path)
image = tf.image.decode_image(image_raw)

image = preprocess(image)
image_probs = pretrained_model.predict(image)

Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg
90112/83281 [================================] - 0s 0us/step

画像を見てみましょう。

plt.figure()
plt.imshow(image[0])
_, image_class, class_confidence = get_imagenet_label(image_probs)
plt.title('{} : {:.2f}% Confidence'.format(image_class, class_confidence*100))
plt.show()

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
40960/35363 [==================================] - 0s 0us/step

敵対的画像を作成する

fast gradient sign メソッドを実装する

最初のステップは摂動を作成することです、これは元の画像を歪めるために使用されて敵対的画像という結果になります。言及したように、このタスクのためには、勾配は画像に関して取られます。

loss_object = tf.keras.losses.CategoricalCrossentropy()

def create_adversarial_pattern(input_image, input_label):
  with tf.GradientTape() as tape:
    tape.watch(input_image)
    prediction = pretrained_model(input_image)
    loss = loss_object(input_label, prediction)

  # Get the gradients of the loss w.r.t to the input image.
  gradient = tape.gradient(loss, input_image)
  # Get the sign of the gradients to create the perturbation
  signed_grad = tf.sign(gradient)
  return signed_grad

結果としての摂動もまた可視化できます。

# Get the input label of the image.
labrador_retriever_index = 208
label = tf.one_hot(labrador_retriever_index, image_probs.shape[-1])

perturbations = create_adversarial_pattern(image, label)
plt.imshow(perturbations[0])

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

<matplotlib.image.AxesImage at 0x7f392c049978>

epsilon の異なる値のためにこれを試して結果の画像を観測してみましょう。epsilon の値が増加するにつれてネットワークを騙すことが容易になることに気がつくでしょう、けれども、これはより同一視できる摂動という結果とのトレードオフとなります。

def display_images(image, description):
  _, label, confidence = get_imagenet_label(pretrained_model.predict(image))
  plt.figure()
  plt.imshow(image[0])
  plt.title('{} \n {} : {:.2f}% Confidence'.format(description,
                                                   label, confidence*100))
  plt.show()

epsilons = [0, 0.01, 0.1, 0.15]
descriptions = [('Epsilon = {:0.3f}'.format(eps) if eps else 'Input')
                for eps in epsilons]

for i, eps in enumerate(epsilons):
  adv_x = image + eps*perturbations
  adv_x = tf.clip_by_value(adv_x, 0, 1)
  display_images(adv_x, descriptions[i])

Next steps

敵対的攻撃について知った今、これを異なるデータセットと異なるアーキテクチャで試すことができます。貴方自身のモデルを作成して訓練し、そしてそれから同じメソッドを使用してそれを騙すことを試みることもできます。epsilon を変更するにつれて予測の信頼度がどのように変化するかを試して見ることもできます。

パワフルですが、このチュートリアルで示される攻撃は敵対的攻撃への研究の単なるスタートで、それから更にパワフルな攻撃を作成する複数のペーパーがあります。敵対的攻撃に加えて、研究は防御の作成にも繋がりました、これは強固な機械学習モデルを作成することを目的とします。敵対的攻撃と防御の包括的なリストのためにこの survey ペーパーをレビューしても良いです。

敵対的攻撃と防御のより多くの実装については、敵対的サンプル・ライブラリ CleverHans を見ることを望むかもしれません。

以上

月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30