Keras 3 : examples : 最新の MLPモデルによる画像分類 (翻訳/解説)

翻訳 : クラスキャットセールスインフォメーション
作成日時 : 12/12/2023

* 本ページは、以下のドキュメントを翻訳した上で適宜、補足説明したものです：

Code examples : Computer Vision : Image classification with modern MLP models (Author: Khalid Salama, Last modified: 2023/08/03)

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

クラスキャット人工知能研究開発支援サービス

◆ クラスキャットは人工知能・テレワークに関する各種サービスを提供しています。お気軽にご相談ください :

人工知能研究開発支援
1. 人工知能研修サービス(経営者層向けオンサイト研修)
2. テクニカルコンサルティングサービス
3. 実証実験(プロトタイプ構築)
4. アプリケーションへの実装
人工知能研修サービス
PoC(概念実証)を失敗させないための支援

◆ 人工知能とビジネスをテーマに WEB セミナーを定期的に開催しています。スケジュール。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

クラスキャット セールス・マーケティング本部セールス・インフォメーション
sales-info@classcat.com ; Website: www.classcat.com ; ClassCatJP

Keras 3 : examples :最新の MLPモデルによる画像分類

説明: CIFAR-100 画像分類用に MLP-Mixer, FNet と gMLP モデルを実装する。

イントロダクション

このサンプルは画像分類のための 3 つの最新のアテンション-free な多層パーセプトロン (MLP) ベースのモデルを実装します、CIFAR-100 データセット上で実演されます :

Ilya Tolstikhin et al. による MLP-Mixer モデル、2 つのタイプの MLP に基づいています。
James Lee-Thorp et al. による FNet モデル、unparameterized フーリエ変換に基づいています。
Hanxiao Liu et al. による gMLP モデル、ゲーティングを持つ MLP に基づいています。

このサンプルの目的はこれらのモデルを比較することではありません、これらのモデルは上手く調整されたハイパーパラメータにより異なるデータセット上では異なって遂行される可能性があるからです。むしろ、それらの主要なビルディングブロックの単純な実装を示すことにあります。

セットアップ

import numpy as np
import keras
from keras import layers

データの準備

num_classes = 100
input_shape = (32, 32, 3)

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar100.load_data()

print(f"x_train shape: {x_train.shape} - y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape} - y_test shape: {y_test.shape}")

x_train shape: (50000, 32, 32, 3) - y_train shape: (50000, 1)
x_test shape: (10000, 32, 32, 3) - y_test shape: (10000, 1)

ハイパーパラメータの設定構成

weight_decay = 0.0001
batch_size = 128
num_epochs = 1  # Recommended num_epochs = 50
dropout_rate = 0.2
image_size = 64  # We'll resize input images to this size.
patch_size = 8  # Size of the patches to be extracted from the input images.
num_patches = (image_size // patch_size) ** 2  # Size of the data array.
embedding_dim = 256  # Number of hidden units.
num_blocks = 4  # Number of blocks.

print(f"Image size: {image_size} X {image_size} = {image_size ** 2}")
print(f"Patch size: {patch_size} X {patch_size} = {patch_size ** 2} ")
print(f"Patches per image: {num_patches}")
print(f"Elements per patch (3 channels): {(patch_size ** 2) * 3}")

Image size: 64 X 64 = 4096
Patch size: 8 X 8 = 64 
Patches per image: 64
Elements per patch (3 channels): 192

分類モデルの構築

処理ブロックが与えられた場合に分類器を構築する手法を実装します。

def build_classifier(blocks, positional_encoding=False):
    inputs = layers.Input(shape=input_shape)
    # Augment data.
    augmented = data_augmentation(inputs)
    # Create patches.
    patches = Patches(patch_size)(augmented)
    # Encode patches to generate a [batch_size, num_patches, embedding_dim] tensor.
    x = layers.Dense(units=embedding_dim)(patches)
    if positional_encoding:
        x = x + PositionEmbedding(sequence_length=num_patches)(x)
    # Process x using the module blocks.
    x = blocks(x)
    # Apply global average pooling to generate a [batch_size, embedding_dim] representation tensor.
    representation = layers.GlobalAveragePooling1D()(x)
    # Apply dropout.
    representation = layers.Dropout(rate=dropout_rate)(representation)
    # Compute logits outputs.
    logits = layers.Dense(num_classes)(representation)
    # Create the Keras model.
    return keras.Model(inputs=inputs, outputs=logits)

実験の定義

与えられたモデルをコンパイル、訓練そして評価するユティリティ関数を実装します。

def run_experiment(model):
    # Create Adam optimizer with weight decay.
    optimizer = keras.optimizers.AdamW(
        learning_rate=learning_rate,
        weight_decay=weight_decay,
    )
    # Compile the model.
    model.compile(
        optimizer=optimizer,
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[
            keras.metrics.SparseCategoricalAccuracy(name="acc"),
            keras.metrics.SparseTopKCategoricalAccuracy(5, name="top5-acc"),
        ],
    )
    # Create a learning rate scheduler callback.
    reduce_lr = keras.callbacks.ReduceLROnPlateau(
        monitor="val_loss", factor=0.5, patience=5
    )
    # Create an early stopping callback.
    early_stopping = keras.callbacks.EarlyStopping(
        monitor="val_loss", patience=10, restore_best_weights=True
    )
    # Fit the model.
    history = model.fit(
        x=x_train,
        y=y_train,
        batch_size=batch_size,
        epochs=num_epochs,
        validation_split=0.1,
        callbacks=[early_stopping, reduce_lr],
        verbose=0,
    )

    _, accuracy, top_5_accuracy = model.evaluate(x_test, y_test)
    print(f"Test accuracy: {round(accuracy * 100, 2)}%")
    print(f"Test top 5 accuracy: {round(top_5_accuracy * 100, 2)}%")

    # Return history to plot learning curves.
    return history

データ増強の使用

data_augmentation = keras.Sequential(
    [
        layers.Normalization(),
        layers.Resizing(image_size, image_size),
        layers.RandomFlip("horizontal"),
        layers.RandomZoom(height_factor=0.2, width_factor=0.2),
    ],
    name="data_augmentation",
)
# Compute the mean and the variance of the training data for normalization.
data_augmentation.layers[0].adapt(x_train)

パッチ抽出を層として実装する

class Patches(layers.Layer):
    def __init__(self, patch_size, **kwargs):
        super().__init__(**kwargs)
        self.patch_size = patch_size

    def call(self, x):
        patches = keras.ops.image.extract_patches(x, self.patch_size)
        batch_size = keras.ops.shape(patches)[0]
        num_patches = keras.ops.shape(patches)[1] * keras.ops.shape(patches)[2]
        patch_dim = keras.ops.shape(patches)[3]
        out = keras.ops.reshape(patches, (batch_size, num_patches, patch_dim))
        return out

位置埋め込みを層として実装する

class PositionEmbedding(keras.layers.Layer):
    def __init__(
        self,
        sequence_length,
        initializer="glorot_uniform",
        **kwargs,
    ):
        super().__init__(**kwargs)
        if sequence_length is None:
            raise ValueError("`sequence_length` must be an Integer, received `None`.")
        self.sequence_length = int(sequence_length)
        self.initializer = keras.initializers.get(initializer)

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "sequence_length": self.sequence_length,
                "initializer": keras.initializers.serialize(self.initializer),
            }
        )
        return config

    def build(self, input_shape):
        feature_size = input_shape[-1]
        self.position_embeddings = self.add_weight(
            name="embeddings",
            shape=[self.sequence_length, feature_size],
            initializer=self.initializer,
            trainable=True,
        )

        super().build(input_shape)

    def call(self, inputs, start_index=0):
        shape = keras.ops.shape(inputs)
        feature_length = shape[-1]
        sequence_length = shape[-2]
        # trim to match the length of the input sequence, which might be less
        # than the sequence_length of the layer.
        position_embeddings = keras.ops.convert_to_tensor(self.position_embeddings)
        position_embeddings = keras.ops.slice(
            position_embeddings,
            (start_index, 0),
            (sequence_length, feature_length),
        )
        return keras.ops.broadcast_to(position_embeddings, shape)

    def compute_output_shape(self, input_shape):
        return input_shape

MLP-Mixer モデル

MLP-Mixer は多層パーセプトロン (MLP) だけに基づいたアーキテクチャで、2 つのタイプの MLP 層を含みます :

一つは画像パッチに個別に適用されます、これは位置ごとの特徴をミックスします。
他方は (チャネルに沿って) パッチに渡り適用され、これは空間情報をミックスします。

これは Xception モデルのような depthwise に分離可能な畳み込みベースのモデルに似ていますが、2 つの連鎖された dense 変換を持ち、最大プーリングはなく、そしてバッチ正規化の代わりに層正規化があります。

MLP-Mixer モジュールの実装

class MLPMixerLayer(layers.Layer):
    def __init__(self, num_patches, hidden_units, dropout_rate, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.mlp1 = keras.Sequential(
            [
                layers.Dense(units=num_patches, activation="gelu"),
                layers.Dense(units=num_patches),
                layers.Dropout(rate=dropout_rate),
            ]
        )
        self.mlp2 = keras.Sequential(
            [
                layers.Dense(units=num_patches, activation="gelu"),
                layers.Dense(units=hidden_units),
                layers.Dropout(rate=dropout_rate),
            ]
        )
        self.normalize = layers.LayerNormalization(epsilon=1e-6)

    def build(self, input_shape):
        return super().build(input_shape)

    def call(self, inputs):
        # Apply layer normalization.
        x = self.normalize(inputs)
        # Transpose inputs from [num_batches, num_patches, hidden_units] to [num_batches, hidden_units, num_patches].
        x_channels = keras.ops.transpose(x, axes=(0, 2, 1))
        # Apply mlp1 on each channel independently.
        mlp1_outputs = self.mlp1(x_channels)
        # Transpose mlp1_outputs from [num_batches, hidden_dim, num_patches] to [num_batches, num_patches, hidden_units].
        mlp1_outputs = keras.ops.transpose(mlp1_outputs, axes=(0, 2, 1))
        # Add skip connection.
        x = mlp1_outputs + inputs
        # Apply layer normalization.
        x_patches = self.normalize(x)
        # Apply mlp2 on each patch independtenly.
        mlp2_outputs = self.mlp2(x_patches)
        # Add skip connection.
        x = x + mlp2_outputs
        return x

MLP-Mixer モデルを構築、訓練そして評価する

現在の設定でのモデルの訓練は V100 GPU 上でエポック毎におよそ 8 秒かかることに注意してください。

mlpmixer_blocks = keras.Sequential(
    [MLPMixerLayer(num_patches, embedding_dim, dropout_rate) for _ in range(num_blocks)]
)
learning_rate = 0.005
mlpmixer_classifier = build_classifier(mlpmixer_blocks)
history = run_experiment(mlpmixer_classifier)

Test accuracy: 9.76%
Test top 5 accuracy: 30.8%

MLP-Mixer モデルは畳み込みと transformer ベースモデルに比べて遥かに少ない数のパラメータを持つ傾向にあり、これは少ない訓練と計算コストに役立つことに繋がります。

MLP-Mixer 論文で述べられているように、大規模データセット上で事前訓練された場合や、最新の正則化スキームを使用する場合、MLP-Mixer は最先端モデルに匹敵するスコアを達成します。埋め込み次元を増やし、mixer ブロックの数を増やし、そしてモデルをより長く訓練することでより良い結果を得られます。入力画像のサイズを大きくしたり異なるパッチサイズを使用することを試しても良いでしょう。

FNet モデル

FNet は Transformer ブロックに類似のブロックを使用します。けれども、FNet は Transformer ブロックの自己アテンション層をパラメータフリーな 2D フーリエ変換層と置き換えます :

一つの 1D フーリエ変換はパッチに沿って適用されます。
一つの 1D フーリエ変換はチャネルに沿って適用されます。

FNet モジュールの実装

class FNetLayer(layers.Layer):
    def __init__(self, embedding_dim, dropout_rate, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.ffn = keras.Sequential(
            [
                layers.Dense(units=embedding_dim, activation="gelu"),
                layers.Dropout(rate=dropout_rate),
                layers.Dense(units=embedding_dim),
            ]
        )

        self.normalize1 = layers.LayerNormalization(epsilon=1e-6)
        self.normalize2 = layers.LayerNormalization(epsilon=1e-6)

    def call(self, inputs):
        # Apply fourier transformations.
        real_part = inputs
        im_part = keras.ops.zeros_like(inputs)
        x = keras.ops.fft2((real_part, im_part))[0]
        # Add skip connection.
        x = x + inputs
        # Apply layer normalization.
        x = self.normalize1(x)
        # Apply Feedfowrad network.
        x_ffn = self.ffn(x)
        # Add skip connection.
        x = x + x_ffn
        # Apply layer normalization.
        return self.normalize2(x)

FNet モデルの構築、訓練と評価

現在の設定でのモデルの訓練は V100 GPU 上でエポック毎におよそ 8 秒かかることに注意してください。


Test accuracy: 13.82%
Test top 5 accuracy: 36.15%

FNet 論文で述べられているように、埋め込み次元を増やし、FNet ブロックの数を増やし、そしてモデルをより長く訓練することでより良い結果を得られます。入力画像のサイズを大きくしたり異なるパッチサイズを使用することを試しても良いでしょう。FNet は長い入力に非常に効率的にスケールし、アテンション・ベースの Transformer モデルよりも遥かに高速に実行され、そして競争力のある結果を生成します。
 
gMLP モデル
gMLP は空間ゲートユニット (SGU, Spatial Gating Unit) にフィーチャーした MLP アーキテクチャです。SGU は以下により、空間 (チャネル) 次元に渡る交差パッチの相互作用を可能にします :

(チャネルに沿って) パッチに渡る線形射影を適用することにより入力を空間的に変換します。
入力とその空間変換の要素ごとの乗算を適用する。

 
gMLP モジュールの実装
class gMLPLayer(layers.Layer):
    def __init__(self, num_patches, embedding_dim, dropout_rate, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.channel_projection1 = keras.Sequential(
            [
                layers.Dense(units=embedding_dim * 2, activation="gelu"),
                layers.Dropout(rate=dropout_rate),
            ]
        )

        self.channel_projection2 = layers.Dense(units=embedding_dim)

        self.spatial_projection = layers.Dense(
            units=num_patches, bias_initializer="Ones"
        )

        self.normalize1 = layers.LayerNormalization(epsilon=1e-6)
        self.normalize2 = layers.LayerNormalization(epsilon=1e-6)

    def spatial_gating_unit(self, x):
        # Split x along the channel dimensions.
        # Tensors u and v will in the shape of [batch_size, num_patchs, embedding_dim].
        u, v = keras.ops.split(x, indices_or_sections=2, axis=2)
        # Apply layer normalization.
        v = self.normalize2(v)
        # Apply spatial projection.
        v_channels = keras.ops.transpose(v, axes=(0, 2, 1))
        v_projected = self.spatial_projection(v_channels)
        v_projected = keras.ops.transpose(v_projected, axes=(0, 2, 1))
        # Apply element-wise multiplication.
        return u * v_projected

    def call(self, inputs):
        # Apply layer normalization.
        x = self.normalize1(inputs)
        # Apply the first channel projection. x_projected shape: [batch_size, num_patches, embedding_dim * 2].
        x_projected = self.channel_projection1(x)
        # Apply the spatial gating unit. x_spatial shape: [batch_size, num_patches, embedding_dim].
        x_spatial = self.spatial_gating_unit(x_projected)
        # Apply the second channel projection. x_projected shape: [batch_size, num_patches, embedding_dim].
        x_projected = self.channel_projection2(x_spatial)
        # Add skip connection.
        return x + x_projected

 
gMLP モデルの構築、訓練と評価
現在の設定でのモデルの訓練は V100 GPU 上でエポック毎におよそ 9 秒かかることに注意してください。
gmlp_blocks = keras.Sequential(
    [gMLPLayer(num_patches, embedding_dim, dropout_rate) for _ in range(num_blocks)]
)
learning_rate = 0.003
gmlp_classifier = build_classifier(gmlp_blocks)
history = run_experiment(gmlp_classifier)

Test accuracy: 17.05%
Test top 5 accuracy: 42.57%

gMLP 論文で述べられているように、埋め込み次元を増やし、gMLP ブロックの数を増やし、そしてモデルをより長く訓練することでより良い結果を得られます。入力画像のサイズを大きくしたり異なるパッチサイズを使用することを試しても良いでしょう。論文は MixUp と CutMix、そして AutoAugment のような高度な正則化ストラテジーを使用したことに注意してください。

 
以上

月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31