Keras 2 : examples : 手書きテキスト (可変長文字列) 認識 (翻訳/解説)
翻訳 : (株)クラスキャット セールスインフォメーション
作成日時 : 11/20/2021 (keras 2.7.0)
* 本ページは、Keras の以下のドキュメントを翻訳した上で適宜、補足説明したものです:
- Code examples : Computer Vision : Handwriting recognition (Author: A_K_Nain, Sayak Paul)
* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
- 人工知能研究開発支援
- 人工知能研修サービス(経営者層向けオンサイト研修)
- テクニカルコンサルティングサービス
- 実証実験(プロトタイプ構築)
- アプリケーションへの実装
- 人工知能研修サービス
- PoC(概念実証)を失敗させないための支援
- テレワーク & オンライン授業を支援
- お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
- ウェビナー運用には弊社製品「ClassCat® Webinar」を利用しています。
◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。
- 株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション
- E-Mail:sales-info@classcat.com ; WebSite: www.classcat.com ; Facebook
Keras 2 : examples : 手書きテキスト (可変長文字列) 認識
Description: 可変長のシークエンスを使用する手書き認識モデルの訓練。
イントロダクション
このサンプルは Captcha OCR サンプルをどのように IAM データセット に拡張できるかを示します、これは可変長の正解ターゲットを持ちます。データセットの各サンプルは幾つかの手書きテキストの画像で、対応するターゲットは画像に存在する文字列です。IAM データセットは多くの OCR ベンチマークに渡り広く使用されていますので、このサンプルが OCR システムの構築のための良い開始点として役立てることを望みます。
データ収集
!wget -q https://git.io/J0fjL -O IAM_Words.zip
!unzip -qq IAM_Words.zip
!
!mkdir data
!mkdir data/words
!tar -xf IAM_Words/words.tgz -C data/words
!mv IAM_Words/words.txt data
データセットがどのように構造化されているかプレビューします。”#” が先頭にある行は単なるメタ情報です。
!head -20 data/words.txt
#--- words.txt ---------------------------------------------------------------# # # iam database word information # # format: a01-000u-00-00 ok 154 1 408 768 27 51 AT A # # a01-000u-00-00 -> word id for line 00 in form a01-000u # ok -> result of word segmentation # ok: word was correctly # er: segmentation of word can be bad # # 154 -> graylevel to binarize the line containing this word # 1 -> number of components for this word # 408 768 27 51 -> bounding box around this word in x,y,w,h format # AT -> the grammatical tag for this word, see the # file tagset.txt for an explanation # A -> the transcription for this word # a01-000u-00-00 ok 154 408 768 27 51 AT A a01-000u-00-01 ok 154 507 766 213 48 NN MOVE
インポート
from tensorflow.keras.layers.experimental.preprocessing import StringLookup
from tensorflow import keras
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import os
np.random.seed(42)
tf.random.set_seed(42)
データセット分割
base_path = "data"
words_list = []
words = open(f"{base_path}/words.txt", "r").readlines()
for line in words:
if line[0] == "#":
continue
if line.split(" ")[1] != "err": # We don't need to deal with errored entries.
words_list.append(line)
len(words_list)
np.random.shuffle(words_list)
データセットを 90:5:5 比率 (訓練:検証:テスト) で 3 つのサブセットに分割します。
split_idx = int(0.9 * len(words_list))
train_samples = words_list[:split_idx]
test_samples = words_list[split_idx:]
val_split_idx = int(0.5 * len(test_samples))
validation_samples = test_samples[:val_split_idx]
test_samples = test_samples[val_split_idx:]
assert len(words_list) == len(train_samples) + len(validation_samples) + len(
test_samples
)
print(f"Total training samples: {len(train_samples)}")
print(f"Total validation samples: {len(validation_samples)}")
print(f"Total test samples: {len(test_samples)}")
Total training samples: 86810 Total validation samples: 4823 Total test samples: 4823
データ入力パイプライン
最初に画像パスを準備することでデータ入力パイプラインを構築していきます。
base_image_path = os.path.join(base_path, "words")
def get_image_paths_and_labels(samples):
paths = []
corrected_samples = []
for (i, file_line) in enumerate(samples):
line_split = file_line.strip()
line_split = line_split.split(" ")
# Each line split will have this format for the corresponding image:
# part1/part1-part2/part1-part2-part3.png
image_name = line_split[0]
partI = image_name.split("-")[0]
partII = image_name.split("-")[1]
img_path = os.path.join(
base_image_path, partI, partI + "-" + partII, image_name + ".png"
)
if os.path.getsize(img_path):
paths.append(img_path)
corrected_samples.append(file_line.split("\n")[0])
return paths, corrected_samples
train_img_paths, train_labels = get_image_paths_and_labels(train_samples)
validation_img_paths, validation_labels = get_image_paths_and_labels(validation_samples)
test_img_paths, test_labels = get_image_paths_and_labels(test_samples)
次に正解ラベルを準備します。
# Find maximum length and the size of the vocabulary in the training data.
train_labels_cleaned = []
characters = set()
max_len = 0
for label in train_labels:
label = label.split(" ")[-1].strip()
for char in label:
characters.add(char)
max_len = max(max_len, len(label))
train_labels_cleaned.append(label)
print("Maximum length: ", max_len)
print("Vocab size: ", len(characters))
# Check some label samples.
train_labels_cleaned[:10]
Maximum length: 21 Vocab size: 78 ['sure', 'he', 'during', 'of', 'booty', 'gastronomy', 'boy', 'The', 'and', 'in']
次に検証とテストラベルもクリーンアップします。
def clean_labels(labels):
cleaned_labels = []
for label in labels:
label = label.split(" ")[-1].strip()
cleaned_labels.append(label)
return cleaned_labels
validation_labels_cleaned = clean_labels(validation_labels)
test_labels_cleaned = clean_labels(test_labels)
文字語彙の構築
Keras は様々な多様なデータを処理するために様々な前処理層を提供しています。このガイド は包括的なイントロダクションを提供しています。サンプルは文字レベルでのラベルの前処理を必要とします。これは、2 つのラベル, e.g. “cat” と “dog” がある場合、文字語彙は {a, c, d, g, o, t} であることを意味します (特殊トークンはありません)。この目的で StringLookup 層を使用します。
AUTOTUNE = tf.data.AUTOTUNE
# Mapping characters to integers.
char_to_num = StringLookup(vocabulary=list(characters), mask_token=None)
# Mapping integers back to original characters.
num_to_char = StringLookup(
vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True
)
歪み (= distortion) のない画像のリサイズ
正方形の画像の代わりに、多くの OCR モデルは矩形画像を扱います。データセットから幾つかのサンプルを可視化するとき、これはすぐに明らかになります。正方形画像の外観 (= aspect) を無視したリサイズが歪みの大きな総量を導入しない一方で、これは矩形画像には当てはまりません。しかし一様なサイズへの画像のリサイズはミニバッチ処理のための要件です。そこで以下の基準が満たされるようにリサイズを実行する必要があります :
- アスペクト比は保存される。
- 画像の内容は影響されない。
def distortion_free_resize(image, img_size):
w, h = img_size
image = tf.image.resize(image, size=(h, w), preserve_aspect_ratio=True)
# Check tha amount of padding needed to be done.
pad_height = h - tf.shape(image)[0]
pad_width = w - tf.shape(image)[1]
# Only necessary if you want to do same amount of padding on both sides.
if pad_height % 2 != 0:
height = pad_height // 2
pad_height_top = height + 1
pad_height_bottom = height
else:
pad_height_top = pad_height_bottom = pad_height // 2
if pad_width % 2 != 0:
width = pad_width // 2
pad_width_left = width + 1
pad_width_right = width
else:
pad_width_left = pad_width_right = pad_width // 2
image = tf.pad(
image,
paddings=[
[pad_height_top, pad_height_bottom],
[pad_width_left, pad_width_right],
[0, 0],
],
)
image = tf.transpose(image, perm=[1, 0, 2])
image = tf.image.flip_left_right(image)
return image
プレーンなリサイズだけを行なう場合、画像は次のように見えます :
このリサイズが不必要な引き伸ばし (= stretching) を導入したことに気付くでしょう。
ユティリティをまとめる
batch_size = 64
padding_token = 99
image_width = 128
image_height = 32
def preprocess_image(image_path, img_size=(image_width, image_height)):
image = tf.io.read_file(image_path)
image = tf.image.decode_png(image, 1)
image = distortion_free_resize(image, img_size)
image = tf.cast(image, tf.float32) / 255.0
return image
def vectorize_label(label):
label = char_to_num(tf.strings.unicode_split(label, input_encoding="UTF-8"))
length = tf.shape(label)[0]
pad_amount = max_len - length
label = tf.pad(label, paddings=[[0, pad_amount]], constant_values=padding_token)
return label
def process_images_labels(image_path, label):
image = preprocess_image(image_path)
label = vectorize_label(label)
return {"image": image, "label": label}
def prepare_dataset(image_paths, labels):
dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels)).map(
process_images_labels, num_parallel_calls=AUTOTUNE
)
return dataset.batch(batch_size).cache().prefetch(AUTOTUNE)
tf.data.Dataset の準備
train_ds = prepare_dataset(train_img_paths, train_labels_cleaned)
validation_ds = prepare_dataset(validation_img_paths, validation_labels_cleaned)
test_ds = prepare_dataset(test_img_paths, test_labels_cleaned)
幾つかのサンプルの可視化
for data in train_ds.take(1):
images, labels = data["image"], data["label"]
_, ax = plt.subplots(4, 4, figsize=(15, 8))
for i in range(16):
img = images[i]
img = tf.image.flip_left_right(img)
img = tf.transpose(img, perm=[1, 0, 2])
img = (img * 255.0).numpy().clip(0, 255).astype(np.uint8)
img = img[:, :, 0]
# Gather indices where label!= padding_token.
label = labels[i]
indices = tf.gather(label, tf.where(tf.math.not_equal(label, padding_token)))
# Convert to string.
label = tf.strings.reduce_join(num_to_char(indices))
label = label.numpy().decode("utf-8")
ax[i // 4, i % 4].imshow(img, cmap="gray")
ax[i // 4, i % 4].set_title(label)
ax[i // 4, i % 4].axis("off")
plt.show()
元の画像の内容ができる限り忠実に維持されてそれに従ってパディングされることに気付くでしょう。
モデル
私達のモデルはエンドポイント層として CTC 損失を使用します。CTC 損失の深い理解のためには、この投稿 を参照してください。
class CTCLayer(keras.layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
# At test time, just return the computed predictions.
return y_pred
def build_model():
# Inputs to the model
input_img = keras.Input(shape=(image_width, image_height, 1), name="image")
labels = keras.layers.Input(name="label", shape=(None,))
# First conv block.
x = keras.layers.Conv2D(
32,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv1",
)(input_img)
x = keras.layers.MaxPooling2D((2, 2), name="pool1")(x)
# Second conv block.
x = keras.layers.Conv2D(
64,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv2",
)(x)
x = keras.layers.MaxPooling2D((2, 2), name="pool2")(x)
# We have used two max pool with pool size and strides 2.
# Hence, downsampled feature maps are 4x smaller. The number of
# filters in the last layer is 64. Reshape accordingly before
# passing the output to the RNN part of the model.
new_shape = ((image_width // 4), (image_height // 4) * 64)
x = keras.layers.Reshape(target_shape=new_shape, name="reshape")(x)
x = keras.layers.Dense(64, activation="relu", name="dense1")(x)
x = keras.layers.Dropout(0.2)(x)
# RNNs.
x = keras.layers.Bidirectional(
keras.layers.LSTM(128, return_sequences=True, dropout=0.25)
)(x)
x = keras.layers.Bidirectional(
keras.layers.LSTM(64, return_sequences=True, dropout=0.25)
)(x)
# +2 is to account for the two special tokens introduced by the CTC loss.
# The recommendation comes here: https://git.io/J0eXP.
x = keras.layers.Dense(
len(char_to_num.get_vocabulary()) + 2, activation="softmax", name="dense2"
)(x)
# Add CTC layer for calculating CTC loss at each step.
output = CTCLayer(name="ctc_loss")(labels, x)
# Define the model.
model = keras.models.Model(
inputs=[input_img, labels], outputs=output, name="handwriting_recognizer"
)
# Optimizer.
opt = keras.optimizers.Adam()
# Compile the model and return.
model.compile(optimizer=opt)
return model
# Get the model.
model = build_model()
model.summary()
Model: "handwriting_recognizer" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== image (InputLayer) [(None, 128, 32, 1)] 0 __________________________________________________________________________________________________ Conv1 (Conv2D) (None, 128, 32, 32) 320 image[0][0] __________________________________________________________________________________________________ pool1 (MaxPooling2D) (None, 64, 16, 32) 0 Conv1[0][0] __________________________________________________________________________________________________ Conv2 (Conv2D) (None, 64, 16, 64) 18496 pool1[0][0] __________________________________________________________________________________________________ pool2 (MaxPooling2D) (None, 32, 8, 64) 0 Conv2[0][0] __________________________________________________________________________________________________ reshape (Reshape) (None, 32, 512) 0 pool2[0][0] __________________________________________________________________________________________________ dense1 (Dense) (None, 32, 64) 32832 reshape[0][0] __________________________________________________________________________________________________ dropout (Dropout) (None, 32, 64) 0 dense1[0][0] __________________________________________________________________________________________________ bidirectional (Bidirectional) (None, 32, 256) 197632 dropout[0][0] __________________________________________________________________________________________________ bidirectional_1 (Bidirectional) (None, 32, 128) 164352 bidirectional[0][0] __________________________________________________________________________________________________ label (InputLayer) [(None, None)] 0 __________________________________________________________________________________________________ dense2 (Dense) (None, 32, 81) 10449 bidirectional_1[0][0] __________________________________________________________________________________________________ ctc_loss (CTCLayer) (None, 32, 81) 0 label[0][0] dense2[0][0] ================================================================================================== Total params: 424,081 Trainable params: 424,081 Non-trainable params: 0 __________________________________________________________________________________________________
評価メトリック
編集距離 (= Edit Distance ) は OCR モデルを評価するために最も広く使用されているメトリックです。このセクションでは、それを実装してモデルを監視するためにコールバックとして使用します。
最初に便宜上、検証画像とそれらのラベルを分離します。
validation_images = []
validation_labels = []
for batch in validation_ds:
validation_images.append(batch["image"])
validation_labels.append(batch["label"])
次に、編集距離を監視するためにコールバックを作成します。
def calculate_edit_distance(labels, predictions):
# Get a single batch and convert its labels to sparse tensors.
saprse_labels = tf.cast(tf.sparse.from_dense(labels), dtype=tf.int64)
# Make predictions and convert them to sparse tensors.
input_len = np.ones(predictions.shape[0]) * predictions.shape[1]
predictions_decoded = keras.backend.ctc_decode(
predictions, input_length=input_len, greedy=True
)[0][0][:, :max_len]
sparse_predictions = tf.cast(
tf.sparse.from_dense(predictions_decoded), dtype=tf.int64
)
# Compute individual edit distances and average them out.
edit_distances = tf.edit_distance(
sparse_predictions, saprse_labels, normalize=False
)
return tf.reduce_mean(edit_distances)
class EditDistanceCallback(keras.callbacks.Callback):
def __init__(self, pred_model):
super().__init__()
self.prediction_model = pred_model
def on_epoch_end(self, epoch, logs=None):
edit_distances = []
for i in range(len(validation_images)):
labels = validation_labels[i]
predictions = self.prediction_model.predict(validation_images[i])
edit_distances.append(calculate_edit_distance(labels, predictions).numpy())
print(
f"Mean edit distance for epoch {epoch + 1}: {np.mean(edit_distances):.4f}"
)
訓練
さてモデル訓練を開始する準備ができました。
epochs = 10 # To get good results this should be at least 50.
model = build_model()
prediction_model = keras.models.Model(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
)
edit_distance_callback = EditDistanceCallback(prediction_model)
# Train the model.
history = model.fit(
train_ds,
validation_data=validation_ds,
epochs=epochs,
callbacks=[edit_distance_callback],
)
Epoch 1/10 1357/1357 [==============================] - 89s 51ms/step - loss: 13.6670 - val_loss: 11.8041 Mean edit distance for epoch 1: 20.5117 Epoch 2/10 1357/1357 [==============================] - 48s 36ms/step - loss: 10.6864 - val_loss: 9.6994 Mean edit distance for epoch 2: 20.1167 Epoch 3/10 1357/1357 [==============================] - 48s 35ms/step - loss: 9.0437 - val_loss: 8.0355 Mean edit distance for epoch 3: 19.7270 Epoch 4/10 1357/1357 [==============================] - 48s 35ms/step - loss: 7.6098 - val_loss: 6.4239 Mean edit distance for epoch 4: 19.1106 Epoch 5/10 1357/1357 [==============================] - 48s 35ms/step - loss: 6.3194 - val_loss: 4.9814 Mean edit distance for epoch 5: 18.4894 Epoch 6/10 1357/1357 [==============================] - 48s 35ms/step - loss: 5.3417 - val_loss: 4.1307 Mean edit distance for epoch 6: 18.1909 Epoch 7/10 1357/1357 [==============================] - 48s 35ms/step - loss: 4.6396 - val_loss: 3.7706 Mean edit distance for epoch 7: 18.1224 Epoch 8/10 1357/1357 [==============================] - 48s 35ms/step - loss: 4.1926 - val_loss: 3.3682 Mean edit distance for epoch 8: 17.9387 Epoch 9/10 1357/1357 [==============================] - 48s 36ms/step - loss: 3.8532 - val_loss: 3.1829 Mean edit distance for epoch 9: 17.9074 Epoch 10/10 1357/1357 [==============================] - 49s 36ms/step - loss: 3.5769 - val_loss: 2.9221 Mean edit distance for epoch 10: 17.7960
(訳注: 実験結果)
Epoch 1/10 1357/1357 [==============================] - ETA: 0s - loss: 13.6126Mean edit distance for epoch 1: 20.4893 1357/1357 [==============================] - 142s 88ms/step - loss: 13.6126 - val_loss: 11.8357 Epoch 2/10 1357/1357 [==============================] - ETA: 0s - loss: 10.5591Mean edit distance for epoch 2: 20.0652 1357/1357 [==============================] - 63s 46ms/step - loss: 10.5591 - val_loss: 9.5134 Epoch 3/10 1356/1357 [============================>.] - ETA: 0s - loss: 8.7765Mean edit distance for epoch 3: 19.5805 1357/1357 [==============================] - 63s 46ms/step - loss: 8.7769 - val_loss: 7.6806 Epoch 4/10 1356/1357 [============================>.] - ETA: 0s - loss: 7.1568Mean edit distance for epoch 4: 18.8701 1357/1357 [==============================] - 63s 47ms/step - loss: 7.1568 - val_loss: 5.8427 Epoch 5/10 1357/1357 [==============================] - ETA: 0s - loss: 5.9351Mean edit distance for epoch 5: 18.4914 1357/1357 [==============================] - 64s 47ms/step - loss: 5.9351 - val_loss: 4.8136 Epoch 6/10 1356/1357 [============================>.] - ETA: 0s - loss: 5.1171Mean edit distance for epoch 6: 18.2129 1357/1357 [==============================] - 63s 47ms/step - loss: 5.1172 - val_loss: 4.0588 Epoch 7/10 1357/1357 [==============================] - ETA: 0s - loss: 4.5114Mean edit distance for epoch 7: 18.0457 1357/1357 [==============================] - 63s 47ms/step - loss: 4.5114 - val_loss: 3.5931 Epoch 8/10 1356/1357 [============================>.] - ETA: 0s - loss: 4.0710Mean edit distance for epoch 8: 17.8817 1357/1357 [==============================] - 63s 47ms/step - loss: 4.0710 - val_loss: 3.2655 Epoch 9/10 1356/1357 [============================>.] - ETA: 0s - loss: 3.7387Mean edit distance for epoch 9: 17.7877 1357/1357 [==============================] - 63s 47ms/step - loss: 3.7388 - val_loss: 2.9869 Epoch 10/10 1356/1357 [============================>.] - ETA: 0s - loss: 3.5015Mean edit distance for epoch 10: 17.7893 1357/1357 [==============================] - 66s 48ms/step - loss: 3.5016 - val_loss: 2.9744 CPU times: user 16min 33s, sys: 1min 49s, total: 18min 23s Wall time: 14min 27s
(50 epochs)
Epoch 1/50 1357/1357 [==============================] - ETA: 0s - loss: 13.4471Mean edit distance for epoch 1: 20.4373 1357/1357 [==============================] - 74s 50ms/step - loss: 13.4471 - val_loss: 11.8533 Epoch 2/50 1356/1357 [============================>.] - ETA: 0s - loss: 10.7246Mean edit distance for epoch 2: 20.0584 1357/1357 [==============================] - 65s 48ms/step - loss: 10.7249 - val_loss: 9.6146 Epoch 3/50 1357/1357 [==============================] - ETA: 0s - loss: 8.9240Mean edit distance for epoch 3: 19.7461 1357/1357 [==============================] - 65s 48ms/step - loss: 8.9240 - val_loss: 7.8377 Epoch 4/50 1356/1357 [============================>.] - ETA: 0s - loss: 7.3277Mean edit distance for epoch 4: 19.0892 1357/1357 [==============================] - 65s 48ms/step - loss: 7.3277 - val_loss: 6.1525 Epoch 5/50 1357/1357 [==============================] - ETA: 0s - loss: 6.0136Mean edit distance for epoch 5: 18.5805 1357/1357 [==============================] - 65s 48ms/step - loss: 6.0136 - val_loss: 4.8196 Epoch 6/50 1357/1357 [==============================] - ETA: 0s - loss: 5.1234Mean edit distance for epoch 6: 18.2658 1357/1357 [==============================] - 65s 48ms/step - loss: 5.1234 - val_loss: 4.0891 Epoch 7/50 1357/1357 [==============================] - ETA: 0s - loss: 4.5151Mean edit distance for epoch 7: 18.1098 1357/1357 [==============================] - 65s 48ms/step - loss: 4.5151 - val_loss: 3.6587 Epoch 8/50 1356/1357 [============================>.] - ETA: 0s - loss: 4.0568Mean edit distance for epoch 8: 17.8928 1357/1357 [==============================] - 66s 48ms/step - loss: 4.0570 - val_loss: 3.2468 Epoch 9/50 1357/1357 [==============================] - ETA: 0s - loss: 3.7304Mean edit distance for epoch 9: 17.8605 1357/1357 [==============================] - 66s 49ms/step - loss: 3.7304 - val_loss: 3.0323 Epoch 10/50 1357/1357 [==============================] - ETA: 0s - loss: 3.4922Mean edit distance for epoch 10: 17.7409 1357/1357 [==============================] - 66s 49ms/step - loss: 3.4922 - val_loss: 2.8043 Epoch 11/50 1356/1357 [============================>.] - ETA: 0s - loss: 3.2765Mean edit distance for epoch 11: 17.7393 1357/1357 [==============================] - 66s 49ms/step - loss: 3.2767 - val_loss: 2.6934 Epoch 12/50 1356/1357 [============================>.] - ETA: 0s - loss: 3.1172Mean edit distance for epoch 12: 17.6257 1357/1357 [==============================] - 68s 50ms/step - loss: 3.1174 - val_loss: 2.5190 Epoch 13/50 1357/1357 [==============================] - ETA: 0s - loss: 2.9876Mean edit distance for epoch 13: 17.6952 1357/1357 [==============================] - 66s 48ms/step - loss: 2.9876 - val_loss: 2.5399 Epoch 14/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.8918Mean edit distance for epoch 14: 17.6334 1357/1357 [==============================] - 67s 50ms/step - loss: 2.8918 - val_loss: 2.4650 Epoch 15/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.7731Mean edit distance for epoch 15: 17.5921 1357/1357 [==============================] - 66s 49ms/step - loss: 2.7731 - val_loss: 2.3714 Epoch 16/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.6854Mean edit distance for epoch 16: 17.5724 1357/1357 [==============================] - 66s 48ms/step - loss: 2.6857 - val_loss: 2.2714 Epoch 17/50 1357/1357 [==============================] - ETA: 0s - loss: 2.6109Mean edit distance for epoch 17: 17.5473 1357/1357 [==============================] - 66s 48ms/step - loss: 2.6109 - val_loss: 2.1894 Epoch 18/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.5395Mean edit distance for epoch 18: 17.5076 1357/1357 [==============================] - 66s 48ms/step - loss: 2.5398 - val_loss: 2.1756 Epoch 19/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.4615Mean edit distance for epoch 19: 17.4859 1357/1357 [==============================] - 65s 48ms/step - loss: 2.4616 - val_loss: 2.1148 Epoch 20/50 1357/1357 [==============================] - ETA: 0s - loss: 2.4156Mean edit distance for epoch 20: 17.4874 1357/1357 [==============================] - 66s 49ms/step - loss: 2.4156 - val_loss: 2.0923 Epoch 21/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.3655Mean edit distance for epoch 21: 17.4633 1357/1357 [==============================] - 65s 48ms/step - loss: 2.3655 - val_loss: 2.0256 Epoch 22/50 1357/1357 [==============================] - ETA: 0s - loss: 2.3188Mean edit distance for epoch 22: 17.4456 1357/1357 [==============================] - 65s 48ms/step - loss: 2.3188 - val_loss: 2.0074 Epoch 23/50 1357/1357 [==============================] - ETA: 0s - loss: 2.2649Mean edit distance for epoch 23: 17.4421 1357/1357 [==============================] - 65s 48ms/step - loss: 2.2649 - val_loss: 1.9864 Epoch 24/50 1357/1357 [==============================] - ETA: 0s - loss: 2.2271Mean edit distance for epoch 24: 17.4225 1357/1357 [==============================] - 65s 48ms/step - loss: 2.2271 - val_loss: 1.9584 Epoch 25/50 1357/1357 [==============================] - ETA: 0s - loss: 2.1922Mean edit distance for epoch 25: 17.4272 1357/1357 [==============================] - 64s 47ms/step - loss: 2.1922 - val_loss: 1.9683 Epoch 26/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.1556Mean edit distance for epoch 26: 17.4341 1357/1357 [==============================] - 65s 48ms/step - loss: 2.1558 - val_loss: 1.9814 Epoch 27/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.1258Mean edit distance for epoch 27: 17.4183 1357/1357 [==============================] - 65s 48ms/step - loss: 2.1258 - val_loss: 1.9473 Epoch 28/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.0964Mean edit distance for epoch 28: 17.4343 1357/1357 [==============================] - 64s 47ms/step - loss: 2.0964 - val_loss: 1.9463 Epoch 29/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.0501Mean edit distance for epoch 29: 17.4031 1357/1357 [==============================] - 65s 48ms/step - loss: 2.0501 - val_loss: 1.9043 Epoch 30/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.0311Mean edit distance for epoch 30: 17.4161 1357/1357 [==============================] - 65s 48ms/step - loss: 2.0313 - val_loss: 1.8567 Epoch 31/50 1356/1357 [============================>.] - ETA: 0s - loss: 2.0114Mean edit distance for epoch 31: 17.4091 1357/1357 [==============================] - 66s 49ms/step - loss: 2.0116 - val_loss: 1.8696 Epoch 32/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.9824Mean edit distance for epoch 32: 17.3840 1357/1357 [==============================] - 66s 49ms/step - loss: 1.9825 - val_loss: 1.9015 Epoch 33/50 1357/1357 [==============================] - ETA: 0s - loss: 1.9547Mean edit distance for epoch 33: 17.3754 1357/1357 [==============================] - 66s 49ms/step - loss: 1.9547 - val_loss: 1.8477 Epoch 34/50 1357/1357 [==============================] - ETA: 0s - loss: 1.9186Mean edit distance for epoch 34: 17.3707 1357/1357 [==============================] - 66s 48ms/step - loss: 1.9186 - val_loss: 1.8276 Epoch 35/50 1357/1357 [==============================] - ETA: 0s - loss: 1.9059Mean edit distance for epoch 35: 17.3801 1357/1357 [==============================] - 65s 48ms/step - loss: 1.9059 - val_loss: 1.8287 Epoch 36/50 1357/1357 [==============================] - ETA: 0s - loss: 1.8761Mean edit distance for epoch 36: 17.4086 1357/1357 [==============================] - 67s 49ms/step - loss: 1.8761 - val_loss: 1.8549 Epoch 37/50 1357/1357 [==============================] - ETA: 0s - loss: 1.8859Mean edit distance for epoch 37: 17.3773 1357/1357 [==============================] - 67s 49ms/step - loss: 1.8859 - val_loss: 1.7979 Epoch 38/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.8336Mean edit distance for epoch 38: 17.3572 1357/1357 [==============================] - 66s 49ms/step - loss: 1.8335 - val_loss: 1.7572 Epoch 39/50 1357/1357 [==============================] - ETA: 0s - loss: 1.8296Mean edit distance for epoch 39: 17.3655 1357/1357 [==============================] - 66s 49ms/step - loss: 1.8296 - val_loss: 1.7855 Epoch 40/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.8015Mean edit distance for epoch 40: 17.3752 1357/1357 [==============================] - 66s 49ms/step - loss: 1.8014 - val_loss: 1.7763 Epoch 41/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.8819Mean edit distance for epoch 41: 17.3934 1357/1357 [==============================] - 66s 48ms/step - loss: 1.8819 - val_loss: 1.8190 Epoch 42/50 1357/1357 [==============================] - ETA: 0s - loss: 1.7901Mean edit distance for epoch 42: 17.3488 1357/1357 [==============================] - 65s 48ms/step - loss: 1.7901 - val_loss: 1.7245 Epoch 43/50 1357/1357 [==============================] - ETA: 0s - loss: 1.7541Mean edit distance for epoch 43: 17.3599 1357/1357 [==============================] - 67s 49ms/step - loss: 1.7541 - val_loss: 1.7311 Epoch 44/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.7241Mean edit distance for epoch 44: 17.3524 1357/1357 [==============================] - 65s 48ms/step - loss: 1.7242 - val_loss: 1.7827 Epoch 45/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.7227Mean edit distance for epoch 45: 17.3653 1357/1357 [==============================] - 65s 48ms/step - loss: 1.7228 - val_loss: 1.7457 Epoch 46/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.7197Mean edit distance for epoch 46: 17.3513 1357/1357 [==============================] - 65s 48ms/step - loss: 1.7199 - val_loss: 1.7366 Epoch 47/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.6876Mean edit distance for epoch 47: 17.3430 1357/1357 [==============================] - 65s 48ms/step - loss: 1.6875 - val_loss: 1.7244 Epoch 48/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.7167Mean edit distance for epoch 48: 17.3273 1357/1357 [==============================] - 66s 48ms/step - loss: 1.7167 - val_loss: 1.6987 Epoch 49/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.7589Mean edit distance for epoch 49: 17.3550 1357/1357 [==============================] - 65s 48ms/step - loss: 1.7588 - val_loss: 1.7602 Epoch 50/50 1356/1357 [============================>.] - ETA: 0s - loss: 1.6587Mean edit distance for epoch 50: 17.3387 1357/1357 [==============================] - 65s 48ms/step - loss: 1.6589 - val_loss: 1.6907 CPU times: user 1h 16min 31s, sys: 8min 38s, total: 1h 25min 9s Wall time: 1h 1min 48s
推論
# A utility function to decode the output of the network.
def decode_batch_predictions(pred):
input_len = np.ones(pred.shape[0]) * pred.shape[1]
# Use greedy search. For complex tasks, you can use beam search.
results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
:, :max_len
]
# Iterate over the results and get back the text.
output_text = []
for res in results:
res = tf.gather(res, tf.where(tf.math.not_equal(res, -1)))
res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
output_text.append(res)
return output_text
# Let's check results on some test samples.
for batch in test_ds.take(1):
batch_images = batch["image"]
_, ax = plt.subplots(4, 4, figsize=(15, 8))
preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)
for i in range(16):
img = batch_images[i]
img = tf.image.flip_left_right(img)
img = tf.transpose(img, perm=[1, 0, 2])
img = (img * 255.0).numpy().clip(0, 255).astype(np.uint8)
img = img[:, :, 0]
title = f"Prediction: {pred_texts[i]}"
ax[i // 4, i % 4].imshow(img, cmap="gray")
ax[i // 4, i % 4].set_title(title)
ax[i // 4, i % 4].axis("off")
plt.show()
より良い結果を得るには、モデルは少なくとも 50 エポックの間訓練されるべきです。
(50 epochs)
Final remarks
- prediction_model は TensorFlow Lite と完全に互換です。興味があれば、モバイル・アプリケーション内でそれを利用できます。この件については、このノートブック が有用であるかもしれません。
- 総ての訓練サンプルがこのサンプルで観察されるように完全に整列されるわけではありません。これは複雑なシークエンスについてはモデル性能を劣化させる可能性があります。この目的で、Spatial Transformer ネットワーク ( Jaderberg et al. ) を活用できます、これはモデルがその性能を最大化するアフィン変換を学習するのに役立つことができます。
以上