Keras 2 : examples : NNCLR による自己教師あり対照学習 (翻訳/解説)
翻訳 : (株)クラスキャット セールスインフォメーション
作成日時 : 12/04/2021 (keras 2.7.0)
* 本ページは、Keras の以下のドキュメントを翻訳した上で適宜、補足説明したものです:
- Code examples : Computer Vision : Self-supervised contrastive learning with NNCLR (Author: Rishit Dagli)
* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
- 人工知能研究開発支援
- 人工知能研修サービス(経営者層向けオンサイト研修)
- テクニカルコンサルティングサービス
- 実証実験(プロトタイプ構築)
- アプリケーションへの実装
- 人工知能研修サービス
- PoC(概念実証)を失敗させないための支援
- テレワーク & オンライン授業を支援
- お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
- ウェビナー運用には弊社製品「ClassCat® Webinar」を利用しています。
◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。
- 株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション
- E-Mail:sales-info@classcat.com ; WebSite: www.classcat.com ; Facebook
Keras 2 : examples : NNCLR による自己教師あり対照学習
Description: コンピュータビジョンのための自己教師あり学習法、NNCLR の実装。
イントロダクション
自己教師あり学習
自己教師あり表現学習は、高価なラベルやアノテーションなしで raw データからのサンプルの堅牢な表現を取得することが目的です。この分野の初期の方法は、広くて (= ample) 弱い教師ありラベルを持つドメイン上の代理のタスクを含んだ、事前訓練タスクを定義することにフォーカスしていました。そのようなタスクを解くために訓練されたエンコーダは、画像分類のような高価なアノテーションを必要とする他の下流タスクに対して有用であるかもしれない汎用的な特徴を学習することが期待されます。
対照学習
広いカテゴリーの自己教師ありテクニックは対照損失を使用するもので、これらは 画像類似性, 次元削減 (DrLIM) と 顔検証/識別 のような広範囲のコンピュータビジョン・アプリケーションで使用されています。これらの手法はネガティブサンプルを遠ざける一方でポジティブサンプルをまとめてクラスタリングする潜在空間を学習します。
NNCLR
このサンプルでは、論文 With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations, by Google Research and DeepMind で提案されている NNCLR を実装します。
NNCLR は単一インスタンスのポジティブを越えた自己教師あり表現を学習します、これは異なる視点, 変形 (= deformation), そして更にクラス内の変動に対しても不変な、より良い特徴を学習することを可能にします。クラスタリング・ベースの手法は単一インスタンス・ポジティブを越えて素晴らしいアプローチを提供しますが、クラスタ全体がポジティブであることの仮定は初期の過剰な一般化によりパフォーマンスを低下させる可能性があります。代わりに、NNCLR や学習された表現空間内の最近傍をポジティブとして使用します。加えて、NNCLR は SimCLR ( Keras サンプル ) のような既存の対照学習法のパフォーマンスを向上させ、データ増強ストラテジー上で自己教師あり手法の依存性を減じます。
ここに論文著者らによる、SimCLR からのアイデア上で NNCLR が構築される方法を示す素晴らしい可視化があります :
SimCLR は同じ画像の 2 つのビューをポジティブ・ペアとして使用していることがわかります。ランダムデータ増強を使用して生成された、これら 2 つのビューはポジティブ埋め込みペアを得るためにエンコーダに供給されますので、2 つの増強 (画像) 使用することになります。NNCLR はその代わりに、full データ分布を表わす埋め込みのサポートセットを保持し、最近傍を使用してポジティブペアを形成します。サポートセットは、MoCo でのキュー ((i.e. first-in-first-out)) に似ていて、訓練の間にメモリとして使用されます。
このサンプルは TensorFlow 2.6 またはそれ以上、そして tensorflow_datasets を必要とします、これはこのコマンドでインストールできます :
!pip install tensorflow-datasets
Requirement already satisfied: tensorflow-datasets in /opt/conda/lib/python3.7/site-packages (4.3.0) Requirement already satisfied: requests>=2.19.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (2.25.1) Requirement already satisfied: typing-extensions in /home/jupyter/.local/lib/python3.7/site-packages (from tensorflow-datasets) (3.7.4.3) Requirement already satisfied: tensorflow-metadata in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (1.2.0) Requirement already satisfied: absl-py in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (0.13.0) Requirement already satisfied: promise in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (2.3) Requirement already satisfied: six in /home/jupyter/.local/lib/python3.7/site-packages (from tensorflow-datasets) (1.15.0) Requirement already satisfied: termcolor in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (1.1.0) Requirement already satisfied: protobuf>=3.12.2 in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (3.16.0) Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (4.62.2) Requirement already satisfied: attrs>=18.1.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (21.2.0) Requirement already satisfied: future in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (0.18.2) Requirement already satisfied: dill in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (0.3.4) Requirement already satisfied: importlib-resources in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (5.2.2) Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from tensorflow-datasets) (1.19.5) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow-datasets) (2021.5.30) Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow-datasets) (4.0.0) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow-datasets) (2.10) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->tensorflow-datasets) (1.26.6) Requirement already satisfied: zipp>=3.1.0 in /opt/conda/lib/python3.7/site-packages (from importlib-resources->tensorflow-datasets) (3.5.0) Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /opt/conda/lib/python3.7/site-packages (from tensorflow-metadata->tensorflow-datasets) (1.53.0) Collecting absl-py Downloading absl_py-0.12.0-py3-none-any.whl (129 kB) [K |████████████████████████████████| 129 kB 8.1 MB/s [?25hInstalling collected packages: absl-py Attempting uninstall: absl-py Found existing installation: absl-py 0.13.0 Uninstalling absl-py-0.13.0: Successfully uninstalled absl-py-0.13.0 [31mERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '_flagvalues.cpython-37.pyc' Consider using the `--user` option or check the permissions.
セットアップ
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
from tensorflow.keras import layers
ハイパーパラメータ
原論文で示されているようにより大きな queue_size はより良いパフォーマンスを多くの場合意味しますが、かなりの計算上のオーバーヘッドを持ち込みます。著者らは NNCLR の最善の結果は 98,304 (彼らが実験した最大の queue_size) のキューサイズで達成されることを示しています。ここでは実行例を示すために 10,000 を使用します。
AUTOTUNE = tf.data.AUTOTUNE
shuffle_buffer = 5000
# The below two values are taken from https://www.tensorflow.org/datasets/catalog/stl10
labelled_train_images = 5000
unlabelled_images = 100000
temperature = 0.1
queue_size = 10000
contrastive_augmenter = {
"brightness": 0.5,
"name": "contrastive_augmenter",
"scale": (0.2, 1.0),
}
classification_augmenter = {
"brightness": 0.2,
"name": "classification_augmenter",
"scale": (0.5, 1.0),
}
input_shape = (96, 96, 3)
width = 128
num_epochs = 25
steps_per_epoch = 200
データセットをロードする
TensorFlow Datasets からの STL-10 データセットをロードします、教師なし特徴学習、深層学習、self-taught 学習アルゴリズムを開発するための画像認識データセットです。CIFAR-10 データセットにインスパイアされえ、幾つかの修正を加えています。
dataset_name = "stl10"
def prepare_dataset():
unlabeled_batch_size = unlabelled_images // steps_per_epoch
labeled_batch_size = labelled_train_images // steps_per_epoch
batch_size = unlabeled_batch_size + labeled_batch_size
unlabeled_train_dataset = (
tfds.load(
dataset_name, split="unlabelled", as_supervised=True, shuffle_files=True
)
.shuffle(buffer_size=shuffle_buffer)
.batch(unlabeled_batch_size, drop_remainder=True)
)
labeled_train_dataset = (
tfds.load(dataset_name, split="train", as_supervised=True, shuffle_files=True)
.shuffle(buffer_size=shuffle_buffer)
.batch(labeled_batch_size, drop_remainder=True)
)
test_dataset = (
tfds.load(dataset_name, split="test", as_supervised=True)
.batch(batch_size)
.prefetch(buffer_size=AUTOTUNE)
)
train_dataset = tf.data.Dataset.zip(
(unlabeled_train_dataset, labeled_train_dataset)
).prefetch(buffer_size=AUTOTUNE)
return batch_size, train_dataset, labeled_train_dataset, test_dataset
batch_size, train_dataset, labeled_train_dataset, test_dataset = prepare_dataset()
[1mDownloading and preparing dataset 2.46 GiB (download: 2.46 GiB, generated: 1.86 GiB, total: 4.32 GiB) to /home/jupyter/tensorflow_datasets/stl10/1.0.0...[0m Dl Completed...: 0 url [00:00, ? url/s] Dl Size...: 0 MiB [00:00, ? MiB/s] Extraction completed...: 0 file [00:00, ? file/s] Generating splits...: 0%| | 0/3 [00:00, ? splits/s] Generating train examples...: 0%| | 0/5000 [00:00, ? examples/s] 2021-09-18 06:28:15.807796: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:15.924117: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:15.924804: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:15.927672: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-09-18 06:28:15.928626: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:15.929321: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:15.930011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:17.910528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:17.911198: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:17.911790: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-09-18 06:28:17.912414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14684 MB memory: -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0 Shuffling stl10-train.tfrecord...: 0%| | 0/5000 [00:00<?, ? examples/s] Generating test examples...: 0%| | 0/8000 [00:00<?, ? examples/s] Shuffling stl10-test.tfrecord...: 0%| | 0/8000 [00:00<?, ? examples/s] Generating unlabelled examples...: 0%| | 0/100000 [00:00<?, ? examples/s] Shuffling stl10-unlabelled.tfrecord...: 0%| | 0/100000 [00:00<?, ? examples/s] [1mDataset stl10 downloaded and prepared to /home/jupyter/tensorflow_datasets/stl10/1.0.0. Subsequent calls will reuse this data.[0m
増強
SimCLR, BYOL, SwAV 等のような他の自己教師ありテクニックは最高のパフォーマンスを得るために良くデザインされたデータ増強パイプラインに大きく依存しています。けれども、NNCLR は、最近傍がサンプルバリエーションの豊富さを既に提供しますので、複雑な増強にそれほど依存していません。増強パイプラインに良く含まれる幾つかの一般的なテクニックは :
- ランダムなリサイズされたクロップ
- 複数の色の歪み
- ガウスぼかし
NNCLR は複雑な増強にはあまり依存しませんので、入力画像の増強のためにランダムクロップとランダム輝度だけを使用します。
ランダム Resized クロップ
class RandomResizedCrop(layers.Layer):
def __init__(self, scale, ratio):
super(RandomResizedCrop, self).__init__()
self.scale = scale
self.log_ratio = (tf.math.log(ratio[0]), tf.math.log(ratio[1]))
def call(self, images):
batch_size = tf.shape(images)[0]
height = tf.shape(images)[1]
width = tf.shape(images)[2]
random_scales = tf.random.uniform((batch_size,), self.scale[0], self.scale[1])
random_ratios = tf.exp(
tf.random.uniform((batch_size,), self.log_ratio[0], self.log_ratio[1])
)
new_heights = tf.clip_by_value(tf.sqrt(random_scales / random_ratios), 0, 1)
new_widths = tf.clip_by_value(tf.sqrt(random_scales * random_ratios), 0, 1)
height_offsets = tf.random.uniform((batch_size,), 0, 1 - new_heights)
width_offsets = tf.random.uniform((batch_size,), 0, 1 - new_widths)
bounding_boxes = tf.stack(
[
height_offsets,
width_offsets,
height_offsets + new_heights,
width_offsets + new_widths,
],
axis=1,
)
images = tf.image.crop_and_resize(
images, bounding_boxes, tf.range(batch_size), (height, width)
)
return images
ランダム輝度
class RandomBrightness(layers.Layer):
def __init__(self, brightness):
super(RandomBrightness, self).__init__()
self.brightness = brightness
def blend(self, images_1, images_2, ratios):
return tf.clip_by_value(ratios * images_1 + (1.0 - ratios) * images_2, 0, 1)
def random_brightness(self, images):
# random interpolation/extrapolation between the image and darkness
return self.blend(
images,
0,
tf.random.uniform(
(tf.shape(images)[0], 1, 1, 1), 1 - self.brightness, 1 + self.brightness
),
)
def call(self, images):
images = self.random_brightness(images)
return images
増強モジュールの準備
def augmenter(brightness, name, scale):
return keras.Sequential(
[
layers.Input(shape=input_shape),
layers.Rescaling(1 / 255),
layers.RandomFlip("horizontal"),
RandomResizedCrop(scale=scale, ratio=(3 / 4, 4 / 3)),
RandomBrightness(brightness=brightness),
],
name=name,
)
エンコーダ・アーキテクチャ
エンコーダ・アーキテクチャとして ResNet-50 の使用は文献では標準的です。原論文では、著者らは エンコーダ・アーキテクチャとして ResNet-50 を使用して ResNet-50 の出力を空間的に平均しています。けれども、よりパワフルなモデルは訓練時間を増やすだけでなく、より多くのメモリも必要として使用できる最大バッチサイズを制限することに留意してください。このサンプルのためには、4 つの畳み込み層を使用するだけです。
def encoder():
return keras.Sequential(
[
layers.Input(shape=input_shape),
layers.Conv2D(width, kernel_size=3, strides=2, activation="relu"),
layers.Conv2D(width, kernel_size=3, strides=2, activation="relu"),
layers.Conv2D(width, kernel_size=3, strides=2, activation="relu"),
layers.Conv2D(width, kernel_size=3, strides=2, activation="relu"),
layers.Flatten(),
layers.Dense(width, activation="relu"),
],
name="encoder",
)
対照事前訓練のための NNCLR モデル
対照損失を使用してラベル付けされていない画像でエンコーダを訓練します。非線形投影ヘッドはエンコーダの表現の品質を改善しますので、エンコーダの上に装着されます。
class NNCLR(keras.Model):
def __init__(
self, temperature, queue_size,
):
super(NNCLR, self).__init__()
self.probe_accuracy = keras.metrics.SparseCategoricalAccuracy()
self.correlation_accuracy = keras.metrics.SparseCategoricalAccuracy()
self.contrastive_accuracy = keras.metrics.SparseCategoricalAccuracy()
self.probe_loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
self.contrastive_augmenter = augmenter(**contrastive_augmenter)
self.classification_augmenter = augmenter(**classification_augmenter)
self.encoder = encoder()
self.projection_head = keras.Sequential(
[
layers.Input(shape=(width,)),
layers.Dense(width, activation="relu"),
layers.Dense(width),
],
name="projection_head",
)
self.linear_probe = keras.Sequential(
[layers.Input(shape=(width,)), layers.Dense(10)], name="linear_probe"
)
self.temperature = temperature
feature_dimensions = self.encoder.output_shape[1]
self.feature_queue = tf.Variable(
tf.math.l2_normalize(
tf.random.normal(shape=(queue_size, feature_dimensions)), axis=1
),
trainable=False,
)
def compile(self, contrastive_optimizer, probe_optimizer, **kwargs):
super(NNCLR, self).compile(**kwargs)
self.contrastive_optimizer = contrastive_optimizer
self.probe_optimizer = probe_optimizer
def nearest_neighbour(self, projections):
support_similarities = tf.matmul(
projections, self.feature_queue, transpose_b=True
)
nn_projections = tf.gather(
self.feature_queue, tf.argmax(support_similarities, axis=1), axis=0
)
return projections + tf.stop_gradient(nn_projections - projections)
def update_contrastive_accuracy(self, features_1, features_2):
features_1 = tf.math.l2_normalize(features_1, axis=1)
features_2 = tf.math.l2_normalize(features_2, axis=1)
similarities = tf.matmul(features_1, features_2, transpose_b=True)
batch_size = tf.shape(features_1)[0]
contrastive_labels = tf.range(batch_size)
self.contrastive_accuracy.update_state(
tf.concat([contrastive_labels, contrastive_labels], axis=0),
tf.concat([similarities, tf.transpose(similarities)], axis=0),
)
def update_correlation_accuracy(self, features_1, features_2):
features_1 = (
features_1 - tf.reduce_mean(features_1, axis=0)
) / tf.math.reduce_std(features_1, axis=0)
features_2 = (
features_2 - tf.reduce_mean(features_2, axis=0)
) / tf.math.reduce_std(features_2, axis=0)
batch_size = tf.shape(features_1, out_type=tf.float32)[0]
cross_correlation = (
tf.matmul(features_1, features_2, transpose_a=True) / batch_size
)
feature_dim = tf.shape(features_1)[1]
correlation_labels = tf.range(feature_dim)
self.correlation_accuracy.update_state(
tf.concat([correlation_labels, correlation_labels], axis=0),
tf.concat([cross_correlation, tf.transpose(cross_correlation)], axis=0),
)
def contrastive_loss(self, projections_1, projections_2):
projections_1 = tf.math.l2_normalize(projections_1, axis=1)
projections_2 = tf.math.l2_normalize(projections_2, axis=1)
similarities_1_2_1 = (
tf.matmul(
self.nearest_neighbour(projections_1), projections_2, transpose_b=True
)
/ self.temperature
)
similarities_1_2_2 = (
tf.matmul(
projections_2, self.nearest_neighbour(projections_1), transpose_b=True
)
/ self.temperature
)
similarities_2_1_1 = (
tf.matmul(
self.nearest_neighbour(projections_2), projections_1, transpose_b=True
)
/ self.temperature
)
similarities_2_1_2 = (
tf.matmul(
projections_1, self.nearest_neighbour(projections_2), transpose_b=True
)
/ self.temperature
)
batch_size = tf.shape(projections_1)[0]
contrastive_labels = tf.range(batch_size)
loss = keras.losses.sparse_categorical_crossentropy(
tf.concat(
[
contrastive_labels,
contrastive_labels,
contrastive_labels,
contrastive_labels,
],
axis=0,
),
tf.concat(
[
similarities_1_2_1,
similarities_1_2_2,
similarities_2_1_1,
similarities_2_1_2,
],
axis=0,
),
from_logits=True,
)
self.feature_queue.assign(
tf.concat([projections_1, self.feature_queue[:-batch_size]], axis=0)
)
return loss
def train_step(self, data):
(unlabeled_images, _), (labeled_images, labels) = data
images = tf.concat((unlabeled_images, labeled_images), axis=0)
augmented_images_1 = self.contrastive_augmenter(images)
augmented_images_2 = self.contrastive_augmenter(images)
with tf.GradientTape() as tape:
features_1 = self.encoder(augmented_images_1)
features_2 = self.encoder(augmented_images_2)
projections_1 = self.projection_head(features_1)
projections_2 = self.projection_head(features_2)
contrastive_loss = self.contrastive_loss(projections_1, projections_2)
gradients = tape.gradient(
contrastive_loss,
self.encoder.trainable_weights + self.projection_head.trainable_weights,
)
self.contrastive_optimizer.apply_gradients(
zip(
gradients,
self.encoder.trainable_weights + self.projection_head.trainable_weights,
)
)
self.update_contrastive_accuracy(features_1, features_2)
self.update_correlation_accuracy(features_1, features_2)
preprocessed_images = self.classification_augmenter(labeled_images)
with tf.GradientTape() as tape:
features = self.encoder(preprocessed_images)
class_logits = self.linear_probe(features)
probe_loss = self.probe_loss(labels, class_logits)
gradients = tape.gradient(probe_loss, self.linear_probe.trainable_weights)
self.probe_optimizer.apply_gradients(
zip(gradients, self.linear_probe.trainable_weights)
)
self.probe_accuracy.update_state(labels, class_logits)
return {
"c_loss": contrastive_loss,
"c_acc": self.contrastive_accuracy.result(),
"r_acc": self.correlation_accuracy.result(),
"p_loss": probe_loss,
"p_acc": self.probe_accuracy.result(),
}
def test_step(self, data):
labeled_images, labels = data
preprocessed_images = self.classification_augmenter(
labeled_images, training=False
)
features = self.encoder(preprocessed_images, training=False)
class_logits = self.linear_probe(features, training=False)
probe_loss = self.probe_loss(labels, class_logits)
self.probe_accuracy.update_state(labels, class_logits)
return {"p_loss": probe_loss, "p_acc": self.probe_accuracy.result()}
NNCLR の事前訓練
論文で提案されているように 0.1 の temperature と前に説明したように 10,000 の queue_size を使用してネットワークを訓練します。対照損失とプローブ損失の optimizer として Adam を使用してます。このサンプルのためにはモデルを 30 エポックだけ訓練しますが、より良いパフォーマンスのためにはより多くのエポックの間訓練されるべきです。
事前訓練パフォーマンスを監視するために以下の 2 つのメトリクスが使用できて、ログ記録も取っています (この Keras サンプル から引用しました) :
- 対照精度 (Contrastive accuracy) : 自己教師ありメトリックで、画像の表現が、現在のバッチ内の任意の他の画像の表現よりも、その異なる増強されたバージョンのものに類似しているケースの比率。自己教師ありメトリックは、ラベル付けされたあサンプルがない場合でさえも、ハイパーパラメータ調整のために使用できます。
- 線形プロービング精度 : 線形プロービングは自己教師あり分類器を評価するためのポピュラーなメトリックです。それはエンコーダの特徴の上で訓練されたロジスティック回帰分類器の精度として計算されます。私達のケースでは、これは凍結されたエンコーダの上の単一 dense 層を訓練することにより行なわれます。分類器が事前訓練段階の後に訓練される従来のアプローチに反して、このサンプルでは事前訓練の間にそれを訓練しています。これはその精度を僅かに低下させる可能性がありますが、その方法で訓練の間にその値を監視することができます、これは実験とデバッグに役立ちます。
model = NNCLR(temperature=temperature, queue_size=queue_size)
model.compile(
contrastive_optimizer=keras.optimizers.Adam(),
probe_optimizer=keras.optimizers.Adam(),
)
pretrain_history = model.fit(
train_dataset, epochs=num_epochs, validation_data=test_dataset
)
Epoch 1/25 2021-09-18 06:33:53.688856: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2021-09-18 06:34:01.908683: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005 200/200 [==============================] - 46s 125ms/step - c_loss: 3.2890 - c_acc: 0.4006 - r_acc: 0.4409 - p_loss: 2.2239 - p_acc: 0.1201 - val_p_loss: 2.1178 - val_p_acc: 0.2426 Epoch 2/25 200/200 [==============================] - 27s 124ms/step - c_loss: 2.1876 - c_acc: 0.6887 - r_acc: 0.4467 - p_loss: 2.0128 - p_acc: 0.2492 - val_p_loss: 1.9811 - val_p_acc: 0.2966 Epoch 3/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.9057 - c_acc: 0.7590 - r_acc: 0.4452 - p_loss: 1.9197 - p_acc: 0.2945 - val_p_loss: 1.8854 - val_p_acc: 0.3194 Epoch 4/25 200/200 [==============================] - 27s 123ms/step - c_loss: 1.7300 - c_acc: 0.8085 - r_acc: 0.4469 - p_loss: 1.8433 - p_acc: 0.3213 - val_p_loss: 1.7860 - val_p_acc: 0.3347 Epoch 5/25 200/200 [==============================] - 26s 121ms/step - c_loss: 1.6209 - c_acc: 0.8359 - r_acc: 0.4469 - p_loss: 1.7898 - p_acc: 0.3388 - val_p_loss: 1.7563 - val_p_acc: 0.3499 Epoch 6/25 200/200 [==============================] - 26s 122ms/step - c_loss: 1.5700 - c_acc: 0.8521 - r_acc: 0.4458 - p_loss: 1.7577 - p_acc: 0.3573 - val_p_loss: 1.7041 - val_p_acc: 0.3596 Epoch 7/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.5209 - c_acc: 0.8662 - r_acc: 0.4476 - p_loss: 1.7131 - p_acc: 0.3763 - val_p_loss: 1.6810 - val_p_acc: 0.3746 Epoch 8/25 200/200 [==============================] - 26s 122ms/step - c_loss: 1.4823 - c_acc: 0.8751 - r_acc: 0.4454 - p_loss: 1.6869 - p_acc: 0.3775 - val_p_loss: 1.7017 - val_p_acc: 0.3710 Epoch 9/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.4497 - c_acc: 0.8845 - r_acc: 0.4453 - p_loss: 1.6572 - p_acc: 0.3748 - val_p_loss: 1.6328 - val_p_acc: 0.3785 Epoch 10/25 200/200 [==============================] - 26s 122ms/step - c_loss: 1.4338 - c_acc: 0.8903 - r_acc: 0.4455 - p_loss: 1.6426 - p_acc: 0.3898 - val_p_loss: 1.5942 - val_p_acc: 0.3850 Epoch 11/25 200/200 [==============================] - 26s 122ms/step - c_loss: 1.4239 - c_acc: 0.8967 - r_acc: 0.4457 - p_loss: 1.6179 - p_acc: 0.3865 - val_p_loss: 1.5616 - val_p_acc: 0.3841 Epoch 12/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.3998 - c_acc: 0.9000 - r_acc: 0.4474 - p_loss: 1.5955 - p_acc: 0.4014 - val_p_loss: 1.6176 - val_p_acc: 0.4001 Epoch 13/25 200/200 [==============================] - 26s 123ms/step - c_loss: 1.3943 - c_acc: 0.9052 - r_acc: 0.4467 - p_loss: 1.5810 - p_acc: 0.4076 - val_p_loss: 1.6018 - val_p_acc: 0.3904 Epoch 14/25 200/200 [==============================] - 26s 122ms/step - c_loss: 1.3778 - c_acc: 0.9084 - r_acc: 0.4506 - p_loss: 1.5622 - p_acc: 0.4237 - val_p_loss: 1.5296 - val_p_acc: 0.3910 Epoch 15/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.3654 - c_acc: 0.9094 - r_acc: 0.4499 - p_loss: 1.5616 - p_acc: 0.4218 - val_p_loss: 1.5490 - val_p_acc: 0.4060 Epoch 16/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.3615 - c_acc: 0.9127 - r_acc: 0.4500 - p_loss: 1.5478 - p_acc: 0.4083 - val_p_loss: 1.5626 - val_p_acc: 0.4047 Epoch 17/25 200/200 [==============================] - 27s 123ms/step - c_loss: 1.3519 - c_acc: 0.9153 - r_acc: 0.4503 - p_loss: 1.5442 - p_acc: 0.4276 - val_p_loss: 1.6472 - val_p_acc: 0.3979 Epoch 18/25 200/200 [==============================] - 27s 123ms/step - c_loss: 1.3518 - c_acc: 0.9163 - r_acc: 0.4523 - p_loss: 1.5314 - p_acc: 0.4202 - val_p_loss: 1.6003 - val_p_acc: 0.4103 Epoch 19/25 200/200 [==============================] - 27s 123ms/step - c_loss: 1.3362 - c_acc: 0.9199 - r_acc: 0.4518 - p_loss: 1.5273 - p_acc: 0.4245 - val_p_loss: 1.5676 - val_p_acc: 0.4075 Epoch 20/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.3266 - c_acc: 0.9205 - r_acc: 0.4536 - p_loss: 1.5180 - p_acc: 0.4340 - val_p_loss: 1.5902 - val_p_acc: 0.3995 Epoch 21/25 200/200 [==============================] - 27s 124ms/step - c_loss: 1.3315 - c_acc: 0.9211 - r_acc: 0.4567 - p_loss: 1.5148 - p_acc: 0.4359 - val_p_loss: 1.5301 - val_p_acc: 0.4092 Epoch 22/25 200/200 [==============================] - 27s 123ms/step - c_loss: 1.3216 - c_acc: 0.9207 - r_acc: 0.4579 - p_loss: 1.5201 - p_acc: 0.4270 - val_p_loss: 1.6063 - val_p_acc: 0.4123 Epoch 23/25 200/200 [==============================] - 26s 123ms/step - c_loss: 1.3207 - c_acc: 0.9229 - r_acc: 0.4578 - p_loss: 1.5120 - p_acc: 0.4308 - val_p_loss: 1.6611 - val_p_acc: 0.4157 Epoch 24/25 200/200 [==============================] - 27s 125ms/step - c_loss: 1.3081 - c_acc: 0.9243 - r_acc: 0.4586 - p_loss: 1.5267 - p_acc: 0.4325 - val_p_loss: 1.6015 - val_p_acc: 0.4111 Epoch 25/25 200/200 [==============================] - 27s 123ms/step - c_loss: 1.2987 - c_acc: 0.9282 - r_acc: 0.4599 - p_loss: 1.5115 - p_acc: 0.4404 - val_p_loss: 1.6434 - val_p_acc: 0.4123
(訳者注: 実験結果)
Epoch 1/25 200/200 [==============================] - 72s 271ms/step - c_loss: 3.3102 - c_acc: 0.3919 - r_acc: 0.5406 - p_loss: 2.1899 - p_acc: 0.1187 - val_p_loss: 2.0426 - val_p_acc: 0.2600 Epoch 2/25 200/200 [==============================] - 57s 262ms/step - c_loss: 2.1642 - c_acc: 0.6919 - r_acc: 0.6122 - p_loss: 1.9871 - p_acc: 0.2745 - val_p_loss: 1.9596 - val_p_acc: 0.3011 Epoch 3/25 200/200 [==============================] - 56s 259ms/step - c_loss: 1.8825 - c_acc: 0.7671 - r_acc: 0.6098 - p_loss: 1.9016 - p_acc: 0.2829 - val_p_loss: 1.8938 - val_p_acc: 0.3334 Epoch 4/25 200/200 [==============================] - 56s 256ms/step - c_loss: 1.7309 - c_acc: 0.8064 - r_acc: 0.6072 - p_loss: 1.8251 - p_acc: 0.3503 - val_p_loss: 1.7930 - val_p_acc: 0.3584 Epoch 5/25 200/200 [==============================] - 54s 251ms/step - c_loss: 1.6321 - c_acc: 0.8395 - r_acc: 0.6069 - p_loss: 1.7659 - p_acc: 0.3605 - val_p_loss: 1.7626 - val_p_acc: 0.3755 Epoch 6/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.5732 - c_acc: 0.8570 - r_acc: 0.6035 - p_loss: 1.7200 - p_acc: 0.3634 - val_p_loss: 1.6935 - val_p_acc: 0.3650 Epoch 7/25 200/200 [==============================] - 55s 253ms/step - c_loss: 1.5221 - c_acc: 0.8673 - r_acc: 0.6015 - p_loss: 1.6897 - p_acc: 0.3761 - val_p_loss: 1.6838 - val_p_acc: 0.3702 Epoch 8/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.4888 - c_acc: 0.8762 - r_acc: 0.6018 - p_loss: 1.6723 - p_acc: 0.3786 - val_p_loss: 1.6555 - val_p_acc: 0.3755 Epoch 9/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.4606 - c_acc: 0.8847 - r_acc: 0.6019 - p_loss: 1.6412 - p_acc: 0.3881 - val_p_loss: 1.6876 - val_p_acc: 0.3913 Epoch 10/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.4402 - c_acc: 0.8901 - r_acc: 0.6084 - p_loss: 1.6288 - p_acc: 0.3990 - val_p_loss: 1.6197 - val_p_acc: 0.3823 Epoch 11/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.4270 - c_acc: 0.8943 - r_acc: 0.6103 - p_loss: 1.6111 - p_acc: 0.4080 - val_p_loss: 1.6123 - val_p_acc: 0.3964 Epoch 12/25 200/200 [==============================] - 55s 253ms/step - c_loss: 1.3980 - c_acc: 0.8988 - r_acc: 0.6140 - p_loss: 1.5944 - p_acc: 0.4048 - val_p_loss: 1.6337 - val_p_acc: 0.3938 Epoch 13/25 200/200 [==============================] - 55s 254ms/step - c_loss: 1.3845 - c_acc: 0.9025 - r_acc: 0.6131 - p_loss: 1.5775 - p_acc: 0.4155 - val_p_loss: 1.5865 - val_p_acc: 0.3951 Epoch 14/25 200/200 [==============================] - 55s 253ms/step - c_loss: 1.3851 - c_acc: 0.9056 - r_acc: 0.6140 - p_loss: 1.5711 - p_acc: 0.4170 - val_p_loss: 1.5420 - val_p_acc: 0.4072 Epoch 15/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.3661 - c_acc: 0.9093 - r_acc: 0.6160 - p_loss: 1.5537 - p_acc: 0.4286 - val_p_loss: 1.6231 - val_p_acc: 0.3957 Epoch 16/25 200/200 [==============================] - 55s 251ms/step - c_loss: 1.3673 - c_acc: 0.9119 - r_acc: 0.6157 - p_loss: 1.5599 - p_acc: 0.4199 - val_p_loss: 1.6263 - val_p_acc: 0.4115 Epoch 17/25 200/200 [==============================] - 54s 249ms/step - c_loss: 1.3502 - c_acc: 0.9141 - r_acc: 0.6170 - p_loss: 1.5497 - p_acc: 0.4374 - val_p_loss: 1.6305 - val_p_acc: 0.3968 Epoch 18/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.3476 - c_acc: 0.9156 - r_acc: 0.6166 - p_loss: 1.5375 - p_acc: 0.4440 - val_p_loss: 1.6269 - val_p_acc: 0.3971 Epoch 19/25 200/200 [==============================] - 55s 251ms/step - c_loss: 1.3412 - c_acc: 0.9184 - r_acc: 0.6170 - p_loss: 1.5408 - p_acc: 0.4346 - val_p_loss: 1.5857 - val_p_acc: 0.4065 Epoch 20/25 200/200 [==============================] - 55s 251ms/step - c_loss: 1.3337 - c_acc: 0.9206 - r_acc: 0.6176 - p_loss: 1.5280 - p_acc: 0.4283 - val_p_loss: 1.5457 - val_p_acc: 0.4112 Epoch 21/25 200/200 [==============================] - 54s 249ms/step - c_loss: 1.3227 - c_acc: 0.9213 - r_acc: 0.6172 - p_loss: 1.5176 - p_acc: 0.4420 - val_p_loss: 1.6212 - val_p_acc: 0.4165 Epoch 22/25 200/200 [==============================] - 54s 249ms/step - c_loss: 1.3161 - c_acc: 0.9209 - r_acc: 0.6170 - p_loss: 1.5070 - p_acc: 0.4446 - val_p_loss: 1.6103 - val_p_acc: 0.4223 Epoch 23/25 200/200 [==============================] - 54s 251ms/step - c_loss: 1.3154 - c_acc: 0.9253 - r_acc: 0.6176 - p_loss: 1.5101 - p_acc: 0.4452 - val_p_loss: 1.7033 - val_p_acc: 0.4139 Epoch 24/25 200/200 [==============================] - 55s 252ms/step - c_loss: 1.3074 - c_acc: 0.9250 - r_acc: 0.6172 - p_loss: 1.5189 - p_acc: 0.4221 - val_p_loss: 1.6098 - val_p_acc: 0.4286 Epoch 25/25 200/200 [==============================] - 54s 249ms/step - c_loss: 1.2942 - c_acc: 0.9269 - r_acc: 0.6175 - p_loss: 1.5028 - p_acc: 0.4421 - val_p_loss: 1.5512 - val_p_acc: 0.4214 CPU times: user 38min 56s, sys: 3min 21s, total: 42min 18s Wall time: 29min 40s
モデルの評価
コンピュータビジョンで SSL 法や他の事前訓練法を評価する一般的な方法は訓練済みのバックボーンモデルの凍結された特徴上の線形分類器を学習して未見の画像上でその分類器を評価することです。他の手法はソースデータセット上や 5% か 10% のラベルが存在するターゲットデータ上で再調整を含む場合もあります。ちょうど訓練したバックボーンは (ここで行なったように) 画像分類やセグメンテーションや検出のような任意の下流タスクのために利用できます、そこではバックボーンモデルは通常は教師あり学習で事前訓練されます。
finetuning_model = keras.Sequential(
[
layers.Input(shape=input_shape),
augmenter(**classification_augmenter),
model.encoder,
layers.Dense(10),
],
name="finetuning_model",
)
finetuning_model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")],
)
finetuning_history = finetuning_model.fit(
labeled_train_dataset, epochs=num_epochs, validation_data=test_dataset
)
Epoch 1/25 200/200 [==============================] - 4s 14ms/step - loss: 1.9094 - acc: 0.2770 - val_loss: 1.6228 - val_acc: 0.3735 Epoch 2/25 200/200 [==============================] - 4s 13ms/step - loss: 1.5537 - acc: 0.4138 - val_loss: 1.4663 - val_acc: 0.4455 Epoch 3/25 200/200 [==============================] - 4s 13ms/step - loss: 1.4502 - acc: 0.4590 - val_loss: 1.4110 - val_acc: 0.4683 Epoch 4/25 200/200 [==============================] - 4s 13ms/step - loss: 1.3705 - acc: 0.4968 - val_loss: 1.3402 - val_acc: 0.4979 Epoch 5/25 200/200 [==============================] - 4s 13ms/step - loss: 1.2894 - acc: 0.5238 - val_loss: 1.2905 - val_acc: 0.5319 Epoch 6/25 200/200 [==============================] - 4s 13ms/step - loss: 1.2331 - acc: 0.5508 - val_loss: 1.2726 - val_acc: 0.5285 Epoch 7/25 200/200 [==============================] - 4s 13ms/step - loss: 1.1543 - acc: 0.5728 - val_loss: 1.2200 - val_acc: 0.5585 Epoch 8/25 200/200 [==============================] - 4s 14ms/step - loss: 1.0924 - acc: 0.6034 - val_loss: 1.3213 - val_acc: 0.5213 Epoch 9/25 200/200 [==============================] - 4s 13ms/step - loss: 1.0575 - acc: 0.6136 - val_loss: 1.2674 - val_acc: 0.5474 Epoch 10/25 200/200 [==============================] - 4s 13ms/step - loss: 1.0196 - acc: 0.6336 - val_loss: 1.2162 - val_acc: 0.5621 Epoch 11/25 200/200 [==============================] - 4s 15ms/step - loss: 0.9818 - acc: 0.6322 - val_loss: 1.2032 - val_acc: 0.5746 Epoch 12/25 200/200 [==============================] - 4s 14ms/step - loss: 0.9608 - acc: 0.6510 - val_loss: 1.2000 - val_acc: 0.5695 Epoch 13/25 200/200 [==============================] - 4s 13ms/step - loss: 0.9295 - acc: 0.6598 - val_loss: 1.1348 - val_acc: 0.5890 Epoch 14/25 200/200 [==============================] - 4s 14ms/step - loss: 0.9131 - acc: 0.6804 - val_loss: 1.1133 - val_acc: 0.6089 Epoch 15/25 200/200 [==============================] - 4s 14ms/step - loss: 0.8418 - acc: 0.6982 - val_loss: 1.1153 - val_acc: 0.6051 Epoch 16/25 200/200 [==============================] - 4s 14ms/step - loss: 0.8300 - acc: 0.6998 - val_loss: 1.1734 - val_acc: 0.6026 Epoch 17/25 200/200 [==============================] - 4s 14ms/step - loss: 0.8190 - acc: 0.7016 - val_loss: 1.1410 - val_acc: 0.6225 Epoch 18/25 200/200 [==============================] - 4s 13ms/step - loss: 0.7935 - acc: 0.7176 - val_loss: 1.2120 - val_acc: 0.5961 Epoch 19/25 200/200 [==============================] - 4s 13ms/step - loss: 0.7528 - acc: 0.7306 - val_loss: 1.1974 - val_acc: 0.6037 Epoch 20/25 200/200 [==============================] - 4s 14ms/step - loss: 0.7735 - acc: 0.7274 - val_loss: 1.1211 - val_acc: 0.6245 Epoch 21/25 200/200 [==============================] - 4s 14ms/step - loss: 0.7384 - acc: 0.7400 - val_loss: 1.2980 - val_acc: 0.5853 Epoch 22/25 200/200 [==============================] - 4s 13ms/step - loss: 0.7198 - acc: 0.7438 - val_loss: 1.1106 - val_acc: 0.6205 Epoch 23/25 200/200 [==============================] - 4s 13ms/step - loss: 0.6972 - acc: 0.7532 - val_loss: 1.1848 - val_acc: 0.6208 Epoch 24/25 200/200 [==============================] - 4s 14ms/step - loss: 0.7054 - acc: 0.7418 - val_loss: 1.1773 - val_acc: 0.6143 Epoch 25/25 200/200 [==============================] - 4s 13ms/step - loss: 0.6698 - acc: 0.7614 - val_loss: 1.2016 - val_acc: 0.6033
Epoch 1/25 200/200 [==============================] - 7s 24ms/step - loss: 1.8331 - acc: 0.3168 - val_loss: 1.5642 - val_acc: 0.4055 Epoch 2/25 200/200 [==============================] - 6s 23ms/step - loss: 1.4962 - acc: 0.4354 - val_loss: 1.5392 - val_acc: 0.4179 Epoch 3/25 200/200 [==============================] - 6s 23ms/step - loss: 1.3692 - acc: 0.4828 - val_loss: 1.4081 - val_acc: 0.4905 Epoch 4/25 200/200 [==============================] - 7s 24ms/step - loss: 1.2654 - acc: 0.5276 - val_loss: 1.2660 - val_acc: 0.5349 Epoch 5/25 200/200 [==============================] - 6s 23ms/step - loss: 1.2244 - acc: 0.5554 - val_loss: 1.2206 - val_acc: 0.5472 Epoch 6/25 200/200 [==============================] - 6s 23ms/step - loss: 1.1316 - acc: 0.5870 - val_loss: 1.1525 - val_acc: 0.5764 Epoch 7/25 200/200 [==============================] - 6s 23ms/step - loss: 1.0778 - acc: 0.6092 - val_loss: 1.1581 - val_acc: 0.5789 Epoch 8/25 200/200 [==============================] - 6s 23ms/step - loss: 1.0242 - acc: 0.6234 - val_loss: 1.1967 - val_acc: 0.5691 Epoch 9/25 200/200 [==============================] - 6s 22ms/step - loss: 0.9914 - acc: 0.6388 - val_loss: 1.1598 - val_acc: 0.5866 Epoch 10/25 200/200 [==============================] - 6s 23ms/step - loss: 0.9174 - acc: 0.6676 - val_loss: 1.1196 - val_acc: 0.6035 Epoch 11/25 200/200 [==============================] - 6s 23ms/step - loss: 0.9178 - acc: 0.6598 - val_loss: 1.1207 - val_acc: 0.6055 Epoch 12/25 200/200 [==============================] - 6s 22ms/step - loss: 0.8919 - acc: 0.6776 - val_loss: 1.0759 - val_acc: 0.6241 Epoch 13/25 200/200 [==============================] - 6s 22ms/step - loss: 0.8593 - acc: 0.6868 - val_loss: 1.0941 - val_acc: 0.6094 Epoch 14/25 200/200 [==============================] - 6s 23ms/step - loss: 0.8086 - acc: 0.7114 - val_loss: 1.1444 - val_acc: 0.5936 Epoch 15/25 200/200 [==============================] - 6s 23ms/step - loss: 0.8038 - acc: 0.7124 - val_loss: 1.1082 - val_acc: 0.6105 Epoch 16/25 200/200 [==============================] - 6s 23ms/step - loss: 0.7742 - acc: 0.7214 - val_loss: 1.1079 - val_acc: 0.6205 Epoch 17/25 200/200 [==============================] - 6s 22ms/step - loss: 0.7725 - acc: 0.7256 - val_loss: 1.1092 - val_acc: 0.6185 Epoch 18/25 200/200 [==============================] - 6s 22ms/step - loss: 0.7281 - acc: 0.7358 - val_loss: 1.1264 - val_acc: 0.6136 Epoch 19/25 200/200 [==============================] - 6s 23ms/step - loss: 0.7020 - acc: 0.7484 - val_loss: 1.0956 - val_acc: 0.6323 Epoch 20/25 200/200 [==============================] - 6s 23ms/step - loss: 0.6802 - acc: 0.7506 - val_loss: 1.0883 - val_acc: 0.6394 Epoch 21/25 200/200 [==============================] - 6s 23ms/step - loss: 0.6736 - acc: 0.7614 - val_loss: 1.1421 - val_acc: 0.6263 Epoch 22/25 200/200 [==============================] - 6s 24ms/step - loss: 0.6508 - acc: 0.7708 - val_loss: 1.1155 - val_acc: 0.6352 Epoch 23/25 200/200 [==============================] - 6s 24ms/step - loss: 0.6394 - acc: 0.7710 - val_loss: 1.1552 - val_acc: 0.6274 Epoch 24/25 200/200 [==============================] - 6s 23ms/step - loss: 0.6198 - acc: 0.7850 - val_loss: 1.0970 - val_acc: 0.6324 Epoch 25/25 200/200 [==============================] - 6s 23ms/step - loss: 0.6167 - acc: 0.7828 - val_loss: 1.1736 - val_acc: 0.6259 CPU times: user 4min 7s, sys: 23.6 s, total: 4min 31s Wall time: 3min 29s
自己教師あり学習は、非常に限定的なラベル付き訓練データへのアクセスだけを持ちながら、ラベルのないデータの大規模コーパスを構築することが何とかできるとき、SEER, SimCLR, SwAV 等々のような以前の手法で示されたように、特に役立ちます。
これらの論文に対するブログ投稿も見るべきです、これらは、最初に大規模なラベルのないデータセット上で事前訓練してから小さいラベル付きのデータセット上で再調整することにより、少ないクラスラベルを使用して良い結果を達成できることをきちんと示しています :
原論文 を確認することも勧めます。
Many thanks to Debidatta Dwibedi (Google Research), primary author of the NNCLR paper for his super-insightful reviews for this example. This example also takes inspiration from the SimCLR Keras Example.
以上