Keras 2 : examples : スーパービジョンによる一貫性訓練 (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 11/14/2021 (keras 2.6.0)

* 本ページは、Keras の以下のドキュメントを翻訳した上で適宜、補足説明したものです：

Code examples : Computer Vision : Consistency training with supervision (Author: Sayak Paul)

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

クラスキャット人工知能研究開発支援サービス ★ 無料 Web セミナー開催中 ★

◆ クラスキャットは人工知能・テレワークに関する各種サービスを提供しております。お気軽にご相談ください :

人工知能研究開発支援
1. 人工知能研修サービス(経営者層向けオンサイト研修)
2. テクニカルコンサルティングサービス
3. 実証実験(プロトタイプ構築)
4. アプリケーションへの実装
人工知能研修サービス
PoC(概念実証)を失敗させないための支援
テレワーク & オンライン授業を支援

◆ 人工知能とビジネスをテーマに WEB セミナーを定期的に開催しています。スケジュール。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
ウェビナー運用には弊社製品「ClassCat® Webinar」を利用しています。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/ ; Facebook

Keras 2 : examples : スーパービジョンによる一貫性訓練

Description: データ分布シフトに対する堅牢性のための一貫性正則化による訓練。

データが独立同分布 (i.i.d.) であるとき深層学習モデルは多くの画像認識タスクにおいて優れています。けれども、入力データにおける (ランダムノイズ, コントラスト変化, ぼかしのような) 微妙な分布シフトにより引き起こされる性能劣化を受ける可能性があります。従って、自然に、何故そうなるのかという疑問が起きます。A Fourier Perspective on Model Robustness in Computer Vision で議論されたように、深層学習モデルがそのようなシフトに対して堅牢である理由はありません。(標準的な画像分類訓練ワークフローのような) 標準的なモデル訓練手続きは、訓練データの形式でそれに供給されたものを越えてモデルが学習することを可能にはしません。

このサンプルでは、以下を行なうことにより (その内部に) 一貫性の感覚を強化した画像分類モデルを訓練していきます :

標準的な画像分類モデルを訓練します。
(RandAugment を使用して増強された) データセットのノイズのあるバージョンで同等、あるいはより大きなモデルを訓練します。
これを行なうため、最初にデータセットのクリーンな画像上で前のモデルの予測を取得します。
それからこれらの予測を使用して、同じ画像のノイズのあるバリアント上でこれらの予測に一致するように 2 番目のモデルを訓練します。これは知識の蒸留 (= Knowledge Distillation) のワークフローと同一ですが、生徒モデル (= student model) は同じかより大きいサイズですので、このプロセスは 自己訓練 (= Self-Training ) としても呼称されます。

この訓練ワークフロー全体はそのルーツを FixMatch, Unsupervised Data Augmentation for Consistency Training (一貫性訓練のための教師なしデータ増強) そして Noisy Student Training (ノイズのある生徒訓練) のようなワークに見出します。この訓練プロセスはクリーンな画像とノイズのある画像に対してモデルが一貫性のある予測を生成することを促しますので、一貫性訓練あるいは一貫性正則化による訓練 としても呼称されることが多いです。サンプルは一般的な corruption に対するモデルの堅牢性を強化するために一貫性訓練を使用することにフォーカスしていますが、このサンプルは弱い教師あり学習を実行するためのテンプレートとしても役立つことができます。

このサンプルは TensorFlow 2.4 またはそれ以上、そして TensorFlow Hub と TensorFlow Models を必要とします、これは次のコマンドを使用してインストールできます :

!pip install -q tf-models-official tensorflow-addons

インポートとセットアップ

from official.vision.image_classification.augment import RandAugment
from tensorflow.keras import layers

import tensorflow as tf
import tensorflow_addons as tfa
import matplotlib.pyplot as plt

tf.random.set_seed(42)

ハイパーパラメータの定義

AUTO = tf.data.AUTOTUNE
BATCH_SIZE = 128
EPOCHS = 5

CROP_TO = 72
RESIZE_TO = 96

CIFAR-10 データセットのロード

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

val_samples = 49500
new_train_x, new_y_train = x_train[: val_samples + 1], y_train[: val_samples + 1]
val_x, val_y = x_train[val_samples:], y_train[val_samples:]

TensorFlow データセット・オブジェクトの作成

# Initialize `RandAugment` object with 2 layers of
# augmentation transforms and strength of 9.
augmenter = RandAugment(num_layers=2, magnitude=9)

教師モデルの訓練については、2 つの幾何学的増強変換 : ランダム水平反転とランダムクロップを使用するだけです。

def preprocess_train(image, label, noisy=True):
    image = tf.image.random_flip_left_right(image)
    # We first resize the original image to a larger dimension
    # and then we take random crops from it.
    image = tf.image.resize(image, [RESIZE_TO, RESIZE_TO])
    image = tf.image.random_crop(image, [CROP_TO, CROP_TO, 3])
    if noisy:
        image = augmenter.distort(image)
    return image, label


def preprocess_test(image, label):
    image = tf.image.resize(image, [CROP_TO, CROP_TO])
    return image, label


train_ds = tf.data.Dataset.from_tensor_slices((new_train_x, new_y_train))
validation_ds = tf.data.Dataset.from_tensor_slices((val_x, val_y))
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test))

順序が正確に同じであることを保証するため、train_clean_ds と train_noisy_ds が同じシードを使用してシャッフルされることを確実にします。これは生徒モデルを訓練する間役立ちます。

# This dataset will be used to train the first model.
train_clean_ds = (
    train_ds.shuffle(BATCH_SIZE * 10, seed=42)
    .map(lambda x, y: (preprocess_train(x, y, noisy=False)), num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

# This prepares the `Dataset` object to use RandAugment.
train_noisy_ds = (
    train_ds.shuffle(BATCH_SIZE * 10, seed=42)
    .map(preprocess_train, num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

validation_ds = (
    validation_ds.map(preprocess_test, num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

test_ds = (
    test_ds.map(preprocess_test, num_parallel_calls=AUTO)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

# This dataset will be used to train the second model.
consistency_training_ds = tf.data.Dataset.zip((train_clean_ds, train_noisy_ds))

データセットの可視化

sample_images, sample_labels = next(iter(train_clean_ds))
plt.figure(figsize=(10, 10))
for i, image in enumerate(sample_images[:9]):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(image.numpy().astype("int"))
    plt.axis("off")

sample_images, sample_labels = next(iter(train_noisy_ds))
plt.figure(figsize=(10, 10))
for i, image in enumerate(sample_images[:9]):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(image.numpy().astype("int"))
    plt.axis("off")

モデル構築ユティリティ関数の定義

次にモデル構築ユティリティを定義します。モデルは ResNet50V2 アーキテクチャに基づいています。

def get_training_model(num_classes=10):
    resnet50_v2 = tf.keras.applications.ResNet50V2(
        weights=None, include_top=False, input_shape=(CROP_TO, CROP_TO, 3),
    )
    model = tf.keras.Sequential(
        [
            layers.Input((CROP_TO, CROP_TO, 3)),
            layers.Rescaling(scale=1.0 / 127.5, offset=-1),
            resnet50_v2,
            layers.GlobalAveragePooling2D(),
            layers.Dense(num_classes),
        ]
    )
    return model

再現性のために、教師ネットワークの初期ランダム重みをシリアライズします。

initial_teacher_model = get_training_model()
initial_teacher_model.save_weights("initial_teacher_model.h5")

教師モデルの訓練

ノイズのある生徒訓練で記されているように、教師モデルが幾何学的アンサンブルで訓練されて生徒モデルがそれを模倣することを強要されたとき、それはより良いパフォーマンスに繋がります。オリジナルのワークはアンサンブル・パートをもたらすために確率的 Depth と Dropout を使用していますが、このサンプルのためには、確率的重み平均 (Stochastic Weight Averaging, SWA) を使用しています、これもまた幾何学的アンサンブルに類似しています。

# Define the callbacks.
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(patience=3)
early_stopping = tf.keras.callbacks.EarlyStopping(
    patience=10, restore_best_weights=True
)

# Initialize SWA from tf-hub.
SWA = tfa.optimizers.SWA

# Compile and train the teacher model.
teacher_model = get_training_model()
teacher_model.load_weights("initial_teacher_model.h5")
teacher_model.compile(
    # Notice that we are wrapping our optimizer within SWA
    optimizer=SWA(tf.keras.optimizers.Adam()),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)
history = teacher_model.fit(
    train_clean_ds,
    epochs=EPOCHS,
    validation_data=validation_ds,
    callbacks=[reduce_lr, early_stopping],
)

# Evaluate the teacher model on the test set.
_, acc = teacher_model.evaluate(test_ds, verbose=0)
print(f"Test accuracy: {acc*100}%")

Epoch 1/5
387/387 [==============================] - 73s 78ms/step - loss: 1.7785 - accuracy: 0.3582 - val_loss: 2.0589 - val_accuracy: 0.3920
Epoch 2/5
387/387 [==============================] - 28s 71ms/step - loss: 1.2493 - accuracy: 0.5542 - val_loss: 1.4228 - val_accuracy: 0.5380
Epoch 3/5
387/387 [==============================] - 28s 73ms/step - loss: 1.0294 - accuracy: 0.6350 - val_loss: 1.4422 - val_accuracy: 0.5900
Epoch 4/5
387/387 [==============================] - 28s 73ms/step - loss: 0.8954 - accuracy: 0.6864 - val_loss: 1.2189 - val_accuracy: 0.6520
Epoch 5/5
387/387 [==============================] - 28s 73ms/step - loss: 0.7879 - accuracy: 0.7231 - val_loss: 0.9790 - val_accuracy: 0.6500
Test accuracy: 65.83999991416931%

(訳注: 実験結果)

Epoch 1/5
387/387 [==============================] - 84s 122ms/step - loss: 1.5417 - accuracy: 0.4389 - val_loss: 1.4408 - val_accuracy: 0.5320 - lr: 0.0010
Epoch 2/5
387/387 [==============================] - 45s 116ms/step - loss: 1.2075 - accuracy: 0.5702 - val_loss: 1.5884 - val_accuracy: 0.5060 - lr: 0.0010
Epoch 3/5
387/387 [==============================] - 45s 117ms/step - loss: 1.0166 - accuracy: 0.6401 - val_loss: 1.8957 - val_accuracy: 0.4640 - lr: 0.0010
Epoch 4/5
387/387 [==============================] - 45s 117ms/step - loss: 0.8922 - accuracy: 0.6888 - val_loss: 1.0377 - val_accuracy: 0.6700 - lr: 0.0010
Epoch 5/5
387/387 [==============================] - 45s 116ms/step - loss: 0.7903 - accuracy: 0.7231 - val_loss: 1.1142 - val_accuracy: 0.6440 - lr: 0.0010
Test accuracy: 64.31999802589417%

自己訓練ユティリティの定義

このパートについては、この Keras サンプルから Distiller クラスを拝借しています。

# Majority of the code is taken from:
# https://keras.io/examples/vision/knowledge_distillation/
class SelfTrainer(tf.keras.Model):
    def __init__(self, student, teacher):
        super(SelfTrainer, self).__init__()
        self.student = student
        self.teacher = teacher

    def compile(
        self, optimizer, metrics, student_loss_fn, distillation_loss_fn, temperature=3,
    ):
        super(SelfTrainer, self).compile(optimizer=optimizer, metrics=metrics)
        self.student_loss_fn = student_loss_fn
        self.distillation_loss_fn = distillation_loss_fn
        self.temperature = temperature

    def train_step(self, data):
        # Since our dataset is a zip of two independent datasets,
        # after initially parsing them, we segregate the
        # respective images and labels next.
        clean_ds, noisy_ds = data
        clean_images, _ = clean_ds
        noisy_images, y = noisy_ds

        # Forward pass of teacher
        teacher_predictions = self.teacher(clean_images, training=False)

        with tf.GradientTape() as tape:
            # Forward pass of student
            student_predictions = self.student(noisy_images, training=True)

            # Compute losses
            student_loss = self.student_loss_fn(y, student_predictions)
            distillation_loss = self.distillation_loss_fn(
                tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
                tf.nn.softmax(student_predictions / self.temperature, axis=1),
            )
            total_loss = (student_loss + distillation_loss) / 2

        # Compute gradients
        trainable_vars = self.student.trainable_variables
        gradients = tape.gradient(total_loss, trainable_vars)

        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Update the metrics configured in `compile()`
        self.compiled_metrics.update_state(
            y, tf.nn.softmax(student_predictions, axis=1)
        )

        # Return a dict of performance
        results = {m.name: m.result() for m in self.metrics}
        results.update({"total_loss": total_loss})
        return results

    def test_step(self, data):
        # During inference, we only pass a dataset consisting images and labels.
        x, y = data

        # Compute predictions
        y_prediction = self.student(x, training=False)

        # Update the metrics
        self.compiled_metrics.update_state(y, tf.nn.softmax(y_prediction, axis=1))

        # Return a dict of performance
        results = {m.name: m.result() for m in self.metrics}
        return results

この実装の唯一の違いは損失が計算される方法です。蒸留損失と生徒損失を別々に重み付けする代わりに、ノイズのある生徒訓練に従ってそれらの平均を取っています。

生徒モデルの訓練

# Define the callbacks.
# We are using a larger decay factor to stabilize the training.
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    patience=3, factor=0.5, monitor="val_accuracy"
)
early_stopping = tf.keras.callbacks.EarlyStopping(
    patience=10, restore_best_weights=True, monitor="val_accuracy"
)

# Compile and train the student model.
self_trainer = SelfTrainer(student=get_training_model(), teacher=teacher_model)
self_trainer.compile(
    # Notice we are *not* using SWA here.
    optimizer="adam",
    metrics=["accuracy"],
    student_loss_fn=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    distillation_loss_fn=tf.keras.losses.KLDivergence(),
    temperature=10,
)
history = self_trainer.fit(
    consistency_training_ds,
    epochs=EPOCHS,
    validation_data=validation_ds,
    callbacks=[reduce_lr, early_stopping],
)

# Evaluate the student model.
acc = self_trainer.evaluate(test_ds, verbose=0)
print(f"Test accuracy from student model: {acc*100}%")

Epoch 1/5
387/387 [==============================] - 39s 84ms/step - accuracy: 0.2112 - total_loss: 1.0629 - val_accuracy: 0.4180
Epoch 2/5
387/387 [==============================] - 32s 82ms/step - accuracy: 0.3341 - total_loss: 0.9554 - val_accuracy: 0.3900
Epoch 3/5
387/387 [==============================] - 31s 81ms/step - accuracy: 0.3873 - total_loss: 0.8852 - val_accuracy: 0.4580
Epoch 4/5
387/387 [==============================] - 31s 81ms/step - accuracy: 0.4294 - total_loss: 0.8423 - val_accuracy: 0.5660
Epoch 5/5
387/387 [==============================] - 31s 81ms/step - accuracy: 0.4547 - total_loss: 0.8093 - val_accuracy: 0.5880
Test accuracy from student model: 58.490002155303955%

(訳注: 実験結果)

Epoch 1/5
387/387 [==============================] - 77s 183ms/step - accuracy: 0.2624 - total_loss: 1.0559 - val_accuracy: 0.3360 - lr: 0.0010
Epoch 2/5
387/387 [==============================] - 70s 181ms/step - accuracy: 0.3614 - total_loss: 0.9276 - val_accuracy: 0.3840 - lr: 0.0010
Epoch 3/5
387/387 [==============================] - 70s 180ms/step - accuracy: 0.4023 - total_loss: 0.8725 - val_accuracy: 0.5240 - lr: 0.0010
Epoch 4/5
387/387 [==============================] - 70s 180ms/step - accuracy: 0.4387 - total_loss: 0.8328 - val_accuracy: 0.5600 - lr: 0.0010
Epoch 5/5
387/387 [==============================] - 70s 180ms/step - accuracy: 0.4705 - total_loss: 0.7903 - val_accuracy: 0.5080 - lr: 0.0010
Test accuracy from student model: 52.240002155303955%

モデルの堅牢性の評価

視覚モデルの堅牢性の評価の標準的なベンチマークは ImageNet-C and CIFAR-10-C のような corrupted データセット上の性能を記録することです、それらの両者は Benchmarking Neural Network Robustness to Common Corruptions and Perturbations で提案されました。このサンプルについては、CIFAR-10-C データセットを使用しています、これは 5 つの異なる深刻度 (= severity) レベルで 19 の異なる corruptions を持ちます。このデータセット上でモデルの堅牢性を評価するため、以下を行ないます :

最高レベルの深刻度で事前訓練済みモデルを実行して top-1 精度を得ます。
top-1 精度の平均を計算します。

このサンプルのためには、これらのステップを通り抜けません。これがモデルを 5 エポックだけ訓練した理由です。このレポジトリを確認することができます、これは full スケールの訓練実験と前述の評価を実演します。下図はその評価のエグゼクティブサマリーを提示しています :

Mean Top-1 結果は CIFAR-10-C データセットを表し Test Top-1 結果は CIFAR-10 テストセットを表します。一貫性訓練がモデル堅牢性を強化するだけでなく標準的なテスト性能を改良する点についても優位であることが明らかです。

以上

2021年11月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30