TensorFlow 2.0 : 上級 Tutorials : テキスト :- RNN でテキスト生成 (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 11/12/2019

* 本ページは、TensorFlow org サイトの TF 2.0 – Advanced Tutorials – Text の以下のページを翻訳した上で
適宜、補足説明したものです：

Text generation with an RNN

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

★ 無料セミナー開催中 ★ クラスキャット主催人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社公式 Web サイトでご確認頂けます。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/

Facebook: https://www.facebook.com/ClassCatJP/

テキスト :- RNN でテキスト生成

このチュートリアルは文字ベース RNN を使用してテキストをどのように生成するかを実演します。私達は Andrej Karpathy の The Unreasonable Effectiveness of Recurrent Neural Networks から Shakespeare の著作のデータセットで作業します。このデータ (“Shakespear”) から文字のシークエンスが与えられたとき、シークエンスの次の文字 (“e”) を予測するモデルを訓練します。モデルを繰り返し呼び出すことによりテキストのより長いシークエンスが生成できます。

Note: このノートブックをより高速に実行するために GPU アクセラレーションを有効にしてください。Colab では : Runtime > Change runtime type > Hardware accelerator > GPU です。ローカルで実行する場合 TensorFlow version >= 1.11 を確実にしてください。

このチュートリアルは tf.keras と eager execution を使用して実装された実行可能なコードを含みます。次はこのチュートリアルのモデルが 30 エポックの間訓練されて、文字列 (= string) “Q” で開始されたときのサンプル出力です :

QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m

センテンスの幾つかが文法的に正しい一方で、多くは意味を成しません。モデルは単語の意味を学習していませんが、以下を考えてください :

私達のモデルは文字ベースです訓練を始めたとき、モデルは英単語をどのようにスペルするか、あるいは単語がテキストのユニットであることさえ知りませんでした。
出力の構造は演劇に類似しています — テキストのブロックは一般に話者名で始まり、データセットと同様に総て大文字です。
下で実演されるように、モデルはテキストの小さいバッチ (それぞれ 100 文字) 上で訓練され、そしてコヒーレント構造を持つテキストのより長いシークエンスを依然として生成することが可能です。

セットアップ

TensorFlow と他のライブラリをインポートする

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

import numpy as np
import os
import time

Shakespeare データセットをダウンロードする

貴方自身のデータ上でこのコードを実行するためには次の行を変更します。

path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
1122304/1115394 [==============================] - 0s 0us/step

データを読む

最初にテキストを少し見てみます :

# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters

# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters

テキストを処理する

テキストをベクトル化する

訓練前に、文字列を数字表現にマップする必要があります。2 つの検索テーブルを作成します: 一つは文字を数字にマップし、そしてもう一つは数字から文字のためです。

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

今では各文字に対する整数表現を持ちます。文字を 0 から len(unique) のインデックスとしてマップしたことが分かるでしょう。

print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  '&' :   4,
  'E' :  17,
  't' :  58,
  'x' :  62,
  'F' :  18,
  'T' :  32,
  'C' :  15,
  'V' :  34,
  'z' :  64,
  '-' :   7,
  'd' :  42,
  '\n':   0,
  'u' :  59,
  'Y' :  37,
  'p' :  54,
  ' ' :   1,
  'q' :  55,
  'P' :  28,
  'o' :  53,
  'm' :  51,
  ...
}

# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'First Citizen' ---- characters mapped to int ---- > [18 47 56 57 58  1 15 47 58 47 64 43 52]

予測タスク

文字、あるいは文字のシークエンスが与えられたとき、最も確からしい次の文字は何でしょう？これが私達がモデルを訓練して遂行するタスクです。モデルへの入力は文字のシークエンスで、出力 — 各時間ステップにおける次の文字を予測するためにモデルを訓練します。

RNN は前に見た要素に依拠する内部状態を維持しますので、この瞬間までに計算された総ての文字が与えられたとき、次の文字は何でしょう？

訓練サンプルとターゲットを作成する

次にテキストをサンプル・シークエンスに分割します。各入力シークエンスはテキストから seq_length 文字を含みます。

各入力シークエンスに対して、対応するターゲットはテキストの同じ長さを含みます、一つの文字が右にシフトされていることを除いて。

そしてテキストを seq_length+1 のチャンクに分解します。例えば、seq_length が 4 そしてテキストが “Hello” であるとします。入力シークエンスは “Hell” で、そしてターゲット・シークエンスは “ello” です。

これを行なうために最初にテキストベクトルを文字インデックスのストリームに変換するために tf.data.Dataset.from_tensor_slices 関数を使用します。

# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

F
i
r
s
t

batch メソッドはこれらの個々の文字を望まれるサイズのシークエンスに容易に変換させます。

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'

各シークエンスについて、各バッチに単純な関数を適用するために map メソッドを使用して入力とターゲットテキストを形成するためにそれを複製してシフトします :

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

最初のサンプル入力とターゲット値をプリントします :

for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '

これらのベクトルの各インデックスは一つの時間ステップとして処理されます。時間ステップ 0 における入力については、モデルは “F” のためのインデックスを受け取りそして次の文字として “i” のためのインデックスを予測しようとします。次の時間ステップでは、それは同じことを行ないますが RNN は現在の入力文字に加えて前のステップのコンテキストも考慮します。

for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 18 ('F')
  expected output: 47 ('i')
Step    1
  input: 47 ('i')
  expected output: 56 ('r')
Step    2
  input: 56 ('r')
  expected output: 57 ('s')
Step    3
  input: 57 ('s')
  expected output: 58 ('t')
Step    4
  input: 58 ('t')
  expected output: 1 (' ')

訓練バッチを作成する

テキストを扱いやすいシークエンスに分割するために tf.data を使用しました。しかしこのデータをモデルに供給する前に、データをシャッフルしてそれをバッチにパックする必要があります。

# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

モデルを構築する

モデルを定義するために tf.keras.Sequential を使用します。この単純なサンプルのためにモデルを定義するために 3 つの層が使用されます :

tf.keras.layers.Embedding: 入力層。各文字の数字を embedding_dim 次元を持つベクトルにマップする訓練可能な検索テーブルです ;
tf.keras.layers.GRU: サイズ units=rnn_units を持つ RNN の一種です (ここで LSTM 層を使用することもできます)。
tf.keras.layers.Dense: vocab_size 出力を持つ、出力層。

# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

各文字についてモデルは埋め込みを検索し、埋め込みを入力として 1 時間ステップ GRU を実行し、そして次の文字の log-尤度を予測するロジットを生成するために dense 層を適用します :

モデルを通過するデータの図

モデルを試す

今はモデルをそれが期待どおりに動作するかを見るために実行します。

最初に出力の shape を確認します :

for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 65) # (batch_size, sequence_length, vocab_size)

上の例では入力のシークエンス長は 100 ですがモデルは任意の長さの入力の上で実行できます :

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           16640     
_________________________________________________________________
gru (GRU)                    (64, None, 1024)          3938304   
_________________________________________________________________
dense (Dense)                (64, None, 65)            66625     
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________

モデルから実際の予測を得るためには、実際の文字インデックスを得るために出力分布からサンプリングする必要があります。この分布は文字語彙に渡るロジットにより定義されます。

Note: この分布からサンプリングすることは重要です、というのは分布の argmax を取ることはループでモデルを容易に行き詰まらせる可能性があるためです。

バッチの最初のサンプルについてそれを試します :

sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

これは、各時間ステップにおける、次の文字インデックスの予測を与えます :

sampled_indices

array([37, 31, 41, 40, 56, 41,  4, 29, 48, 31,  7, 14, 43,  1, 49, 32, 39,
       21, 62,  2, 11, 47, 63, 22,  3, 60, 30, 58,  7, 59, 18, 50, 25, 28,
       52, 45, 15,  1, 43, 28, 23,  5, 59, 49, 56, 30, 20, 13, 64, 53, 63,
       64, 27, 32, 44, 44,  2, 28,  0, 53, 50, 34, 26, 16, 23, 47, 49, 48,
       14, 32,  2, 23, 34, 33, 42,  2,  0, 31,  2, 62,  2,  7, 48,  5, 15,
       31, 43, 37, 60,  3, 49,  7, 51, 15, 34, 13,  1, 51,  9, 55])

この未訓練モデルにより予測されたテキストを見るためにこれらをデコードします :

print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 "state\nOf that integrity which should become't,\nNot having the power to do the good it would,\nFor the"

Next Char Predictions: 
 "YScbrc&QjS-Be kTaIx!;iyJ$vRt-uFlMPngC ePK'ukrRHAzoyzOTff!P\nolVNDKikjBT!KVUd!\nS!x!-j'CSeYv$k-mCVA m3q"

モデルを訓練する

この時点で問題は標準的な分類問題として扱うことができます。前の RNN 状態、そしてこの時間ステップで入力が与えられたとき、次の文字のクラスを予測します。

optimizer、そして損失関数を装着する

この場合には標準的な tf.keras.losses.sparse_categorical_crossentropy 損失関数が動作します、何故ならばそれは予測の最後の次元に渡り適用されるからです。

私達のモデルはロジットを返しますので、from_logits フラグを設定する必要があります。

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.175463

tf.keras.Model.compile メソッドを使用して訓練手続きを configure します。デフォルト引数を持つ tf.keras.optimizers.Adam と損失関数を使用します。

model.compile(optimizer='adam', loss=loss)

チェックポイントを構成する

チェックポイントが訓練の間にセーブされることを確実にするために tf.keras.callbacks.ModelCheckpoint を使用します :

# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

訓練を実行する

訓練時間を合理的なものに保持するために、モデルを訓練するために 10 エポックを使用します。Colab では、より高速な訓練のためにランタイムを GPU に設定します。

EPOCHS=10

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
172/172 [==============================] - 7s 43ms/step - loss: 2.6656
Epoch 2/10
172/172 [==============================] - 6s 33ms/step - loss: 1.9586
Epoch 3/10
172/172 [==============================] - 6s 33ms/step - loss: 1.6907
Epoch 4/10
172/172 [==============================] - 6s 33ms/step - loss: 1.5427
Epoch 5/10
172/172 [==============================] - 6s 34ms/step - loss: 1.4543
Epoch 6/10
172/172 [==============================] - 6s 34ms/step - loss: 1.3929
Epoch 7/10
172/172 [==============================] - 6s 33ms/step - loss: 1.3485
Epoch 8/10
172/172 [==============================] - 6s 33ms/step - loss: 1.3083
Epoch 9/10
172/172 [==============================] - 6s 34ms/step - loss: 1.2738
Epoch 10/10
172/172 [==============================] - 6s 33ms/step - loss: 1.2424

テキストを生成する

予測ループ

次のコードブロックがテキストを生成します :

それは開始文字列を選択し、RNN 状態を初期化してそして生成する文字数を設定することから始まります。
開始文字列と RNN 状態を使用して次の文字の予測分布を得ます。
それから、予測された文字のインデックスを計算するために categorical 分布を使用します。この予測された文字をモデルへの次の入力として使用します。
モデルにより返された RNN 状態はモデルに供給し戻されます、その結果それは今ではただ一つの単語よりも代わりに多くのコンテキストを持ちます。次の単語を予測した後、変更された RNN 状態が再度モデルに供給し戻されます、これがそれが前に予測された単語からより多くのコンテキストを得るときにどのように学習するかです。

生成されたテキストを見れば、モデルがいつ大文字にするべきかを知り、パラグラフを作成してそして Shakespeare 的な著述語彙を模倣するのを見るでしょう。小さい数の訓練エポックでは、それは首尾一貫したセンテンスを形成することをまだ学習していません。

def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: he hath the dispointed scronghy
name of adied to speak. Heaven fram so for to thee;
For what said for this damn'd glory?
Either to thee well.

NARCLIFL:
What, is the water, no more imagine man?

PETRUCHIO:
First, kill the greater livith never face
Turned toep prevailor, which broke here from the deee of a wetcherment,
They shall have he roundly for anorn; Signior Blother Gloucester
'fore the heads of me.

CAMILLO:
Away with him addiving?

NORTHUMBERLAND:
Yea, as tell the drack han to jelieve
A proper king in work was not a gentleman,
Madies of war. It is importune us given to strike
Your trumpetested womd of comes your two,
And match next to Romeo?
O Blint come; thou wilts point too what e'er farewell.

Servant:
Amey you, Gremio: O wonder, tell him, he hath
A horse fair spot commission,
Is her and looks in our commonwealth says by the honour.
What fanish put's colour hath this drop the need at Deat men's master London, I
will tell thee, for this once dogether.

POMPEY:
Ad, Cominius.

F

結果を改善するために貴方ができる最も容易なことはそれをより長く訓練することです (EPOCHS=30 を試してください)。

貴方はまた異なる開始文字列で実験したり、モデル精度を改良するためにもう一つの RNN 層を追加して試したり、あるいは多少のランダム予測を生成するために temperature パラメータを調整することができます。

Advanced: カスタマイズされた訓練

上の訓練手続きは単純ですが、多くの制御を貴方に与えません。

そこでモデルを手動でどのように実行するかを見た今、訓練ループをアンパックしてそれを私達自身で実装しましょう。これは例えば、モデルの open-loop 出力を安定させる手助けをするためにカリキュラム学習を実装するための開始点を与えます。

勾配を追跡するために tf.GradientTape を使用します。eager execution ガイドを読むことによりこのアプローチについてより多く学習できます。

手続きは次のように動作します :

最初に、RNN 状態を初期化します。tf.keras.Model.reset_states メソッドを呼び出すことによりこれを行ないます。
次に、データセットに渡り (バッチ毎に) iterate して各々に関連する予測を計算します。
tf.GradientTape をオープンして、そしてそのコンテキストで予測と損失を計算します。
tf.GradientTape.grads メソッドを使用してモデル変数に関する損失の勾配を計算します。
最後に、optimizer の tf.train.Optimizer.apply_gradients メソッドを使用してステップを下方に取ります。

model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

optimizer = tf.keras.optimizers.Adam()

@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  return loss

# Training step
EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = model.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.
Epoch 1 Batch 0 Loss 4.174566268920898
Epoch 1 Batch 100 Loss 2.355130672454834
Epoch 1 Loss 2.1473
Time taken for 1 epoch 6.901602268218994 sec

Epoch 2 Batch 0 Loss 2.1369407176971436
Epoch 2 Batch 100 Loss 1.9057503938674927
Epoch 2 Loss 1.8186
Time taken for 1 epoch 5.320220232009888 sec

Epoch 3 Batch 0 Loss 1.7763842344284058
Epoch 3 Batch 100 Loss 1.6552444696426392
Epoch 3 Loss 1.5838
Time taken for 1 epoch 5.368591547012329 sec

Epoch 4 Batch 0 Loss 1.5609817504882812
Epoch 4 Batch 100 Loss 1.5019890069961548
Epoch 4 Loss 1.4598
Time taken for 1 epoch 5.411896467208862 sec

Epoch 5 Batch 0 Loss 1.4785513877868652
Epoch 5 Batch 100 Loss 1.4962271451950073
Epoch 5 Loss 1.4789
Time taken for 1 epoch 5.458000421524048 sec

Epoch 6 Batch 0 Loss 1.3991695642471313
Epoch 6 Batch 100 Loss 1.3599822521209717
Epoch 6 Loss 1.3643
Time taken for 1 epoch 5.345423936843872 sec

Epoch 7 Batch 0 Loss 1.3120907545089722
Epoch 7 Batch 100 Loss 1.3584436178207397
Epoch 7 Loss 1.3505
Time taken for 1 epoch 5.293265104293823 sec

Epoch 8 Batch 0 Loss 1.26876699924469
Epoch 8 Batch 100 Loss 1.2855678796768188
Epoch 8 Loss 1.3192
Time taken for 1 epoch 5.329954147338867 sec

Epoch 9 Batch 0 Loss 1.1951823234558105
Epoch 9 Batch 100 Loss 1.2914996147155762
Epoch 9 Loss 1.2724
Time taken for 1 epoch 5.299607753753662 sec

Epoch 10 Batch 0 Loss 1.2261385917663574
Epoch 10 Batch 100 Loss 1.231096625328064
Epoch 10 Loss 1.2388
Time taken for 1 epoch 5.462940454483032 sec

以上

2019年11月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30