TensorFlow 2.0 Beta : 上級 Tutorials : テキストとシークエンス :- RNN でテキスト生成 (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 07/05/2019

* 本ページは、TensorFlow の本家サイトの TF 2.0 Beta – Advanced Tutorials – Text and sequences の以下のページを翻訳した上で適宜、補足説明したものです：

Text generation with an RNN

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

テキストとシークエンス :- RNN でテキスト生成

このチュートリアルは文字ベース RNN を使用してテキストをどのように生成するかを実演します。私達は Andrej Karpathy の The Unreasonable Effectiveness of Recurrent Neural Networks から Shakespeare の著作のデータセットで作業します。このデータ (“Shakespear”) から文字のシークエンスが与えられたとき、シークエンスの次の文字 (“e”) を予測するモデルを訓練します。モデルを繰り返し呼び出すことによりテキストのより長いシークエンスが生成できます。

Note: このノートブックをより高速に実行するために GPU アクセラレーションを有効にしてください。Colab では : Runtime > runtime type を変更 > Hardware acclerator > GPU です。ローカルで実行する場合 TensorFlow version >= 1.11 を確実にしてください。

このチュートリアルは tf.keras と eager execution を使用して実装された実行可能なコードを含みます。次は 30 エポックの間訓練されたこのチュートリアルのモデルが文字列 (= string) “Q” で開始されたときのサンプル出力です :

QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m

センテンスの幾つかが文法的に正しい一方で、多くは意味を成しません。モデルは単語の意味を学習していませんが、以下を考えてください :

私達のモデルは文字ベースです訓練を始めたとき、モデルは英単語をどのようにスペルするか、あるいは単語がテキストのユニットであることさえ知りませんでした。
出力の構造は演劇に類似しています — テキストのブロックは一般に話者名で始まり、元のデータセットと同様に総て大文字です。
下で実演されるように、モデルはテキストの小さいバッチ (それぞれ 100 文字) 上で訓練され、コヒーレント構造を持つテキストのより長いシークエンスを依然として生成することが可能です。

Setup

TensorFlow と他のライブラリをインポートする

from __future__ import absolute_import, division, print_function, unicode_literals

!pip install -q tensorflow-gpu==2.0.0-beta1
import tensorflow as tf

import numpy as np
import os
import time

Shakespeare データセットをダウンロードする

貴方自身のデータ上でこのコードを実行するためには次の行を変更します。

path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
1122304/1115394 [==============================] - 0s 0us/step

データを読む

最初にテキストを少し見てみます :

# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters

# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters

テキストを処理する

テキストをベクトル化する

訓練前に、文字列を数字表現にマップする必要があります。2 つの検索テーブルを作成します: 一つは文字を数字にマップし、そしてもう一つは数字から文字のためです。

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

今では各文字に対する整数表現を持ちます。文字を 0 から len(unique) のインデックスとしてマップしたことが分かるでしょう。

print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  'Z' :  38,
  'n' :  52,
  'x' :  62,
  'd' :  42,
  'k' :  49,
  'i' :  47,
  'z' :  64,
  'W' :  35,
  'Y' :  37,
  'V' :  34,
  'Q' :  29,
  'X' :  36,
  ';' :  11,
  'K' :  23,
  'D' :  16,
  'E' :  17,
  'b' :  40,
  '.' :   8,
  'P' :  28,
  'u' :  59,
  ...
}

# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'First Citizen' ---- characters mapped to int ---- > [18 47 56 57 58  1 15 47 58 47 64 43 52]

予測タスク

文字、あるいは文字のシークエンスが与えられたとき、最もありそうな次の文字は何でしょう？これが私達がモデルを訓練して遂行するタスクです。モデルへの入力は文字のシークエンスで、出力 — 各時間ステップにおける次の文字を予測するためにモデルを訓練します。

RNN は前に見た要素に依拠する内部状態を維持しますので、この瞬間までに計算された総ての文字が与えられたとき、次の文字は何でしょう？

訓練サンプルとターゲットを作成する

次にテキストをサンプル・シークエンスに分割します。各入力シークエンスはテキストからの seq_length 文字を含みます。

各入力シークエンスに対して、対応するターゲットはテキストの同じ長さを含みます、一つの文字が右にシフトされていることを除いて。

そこでテキストを seq_length+1のチャンクに分解します。例えば、seq_length が 4 そしてテキストが “Hello” であるとします。入力シークエンスは “Hell” で、そしてターゲット・シークエンスは “ello” です。

これを行なうために最初にテキストベクトルを文字インデックスのストリームに変換するために tf.data.Dataset.from_tensor_slices 関数を使用します。

# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//seq_length

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

F
i
r
s
t

batch メソッドはこれらの個々の文字を望まれるサイズのシークエンスに容易に変換させます。

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'

各シークエンスについて、各バッチに単純な関数を適用するために map メソッドを使用して入力とターゲットテキストを形成するためにそれを複製してシフトします :

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

最初のサンプル入力とターゲット値をプリントします :

for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '

これらのベクトルの各インデックスは一つの時間ステップとして処理されます。時間ステップ 0 における入力については、モデルは “F” のためのインデックスを受け取りそして次の文字として “i” のためのインデックスを予測しようとします。次の時間ステップでは、それは同じことを行ないますが RNN は現在の入力文字に加えて前のステップのコンテキストも考慮します。

for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 18 ('F')
  expected output: 47 ('i')
Step    1
  input: 47 ('i')
  expected output: 56 ('r')
Step    2
  input: 56 ('r')
  expected output: 57 ('s')
Step    3
  input: 57 ('s')
  expected output: 58 ('t')
Step    4
  input: 58 ('t')
  expected output: 1 (' ')

訓練バッチを作成する

テキストを扱いやすいシークエンスに分割するために tf.data を使用しました。しかしこのデータをモデルに供給する前に、データをシャッフルしてそれをバッチにパックする必要があります。

# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

モデルを構築する

モデルを定義するために tf.keras.Sequential を使用します。この単純なサンプルのためにモデルを定義するために 3 つの層が使用されます :

tf.keras.layers.Embedding: 入力層。各文字の数字を embedding_dim 次元を持つベクトルにマップする訓練可能な検索テーブルです;
tf.keras.layers.GRU: サイズ units=rnn_units を持つ RNN の型です (ここで LSTM 層を使用することもできます)。
tf.keras.layers.Dense: vocab_size 出力を持つ出力層。

# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

WARNING: Logging before flag parsing goes to stderr.
W0304 03:48:46.706135 140067035297664 tf_logging.py:161] <tensorflow.python.keras.layers.recurrent.UnifiedLSTM object at 0x7f637273ccf8>: Note that this layer is not optimized for performance. Please use tf.keras.layers.CuDNNLSTM for better performance on GPU.

各文字についてモデルは埋め込みを検索し、埋め込みを入力として 1 時間ステップ GRU を実行し、そして次の文字の log-尤度を予測するロジットを生成するために dense 層を適用します :

モデルを通過するデータの図

モデルを試す

さてモデルをそれが期待どおりに動作するかを見るために実行します。

最初に出力の shape を確認します :

for input_example_batch, target_example_batch in dataset.take(1):
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 65) # (batch_size, sequence_length, vocab_size)

上の例では入力のシークエンス長は 100 ですがモデルは任意の長さの入力上で実行できます :

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           16640     
_________________________________________________________________
lstm (LSTM)                  (64, None, 1024)          5246976   
_________________________________________________________________
dense (Dense)                (64, None, 65)            66625     
=================================================================
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________

モデルから実際の予測を得るためには、実際の文字インデックスを得るために出力分布からサンプリングする必要があります。この分布は文字語彙に渡るロジットにより定義されます。

Note: この分布からサンプリングすることは重要です、というのは分布の argmax を取ることはループでモデルを容易に stuck させる可能性があるためです。

バッチの最初のサンプルのためにそれを試します :

sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

これは、各時間ステップにおける、次の文字インデックスの予測を与えます :

sampled_indices

array([45, 37,  3, 51, 29,  6, 42, 49, 16,  0,  8, 37, 10, 32, 15, 28, 13,
       24, 11, 59, 35, 59, 38, 36, 21, 63, 34, 56, 48, 23, 28,  8, 30, 28,
       25,  3, 47, 34, 54,  7, 39, 15, 40, 46, 40, 16, 13,  8, 34, 61, 59,
       20, 27, 25,  7, 43, 17, 51, 41, 33, 10, 60,  2, 21, 63, 48, 24, 56,
       43, 39,  9, 46, 14, 38, 55, 40, 20, 32, 38, 29, 46, 12,  4, 60, 11,
       24, 23, 55, 10, 59, 47, 33, 40, 13, 43,  7, 34, 17, 31, 28])

この未訓練モデルにより予測されたテキストを見るためにこれらをデコードします :

print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 'ilent judgment tried it,\nWithout more overture.\n\nLEONTES:\nHow could that be?\nEither thou art most ig'

Next Char Predictions: 
 'gY$mQ,dkD\n.Y:TCPAL;uWuZXIyVrjKP.RPM$iVp-aCbhbDA.VwuHOM-eEmcU:v!IyjLrea3hBZqbHTZQh?&v;LKq:uiUbAe-VESP'

モデルを訓練する

この時点で問題は標準的な分類問題として扱うことができます。前の RNN 状態、そしてこの時間ステップで入力が与えられたとき、次の文字のクラスを予測します。

optimizer、そして損失関数を装着する

この場合には標準的な tf.keras.losses.sparse_softmax_crossentropy 損失関数が動作します、何故ならばそれは予測の最後の次元に渡り適用されるからです。

私達のモデルはロジットを返しますので、from_logits フラグを設定する必要があります。

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.174535

tf.keras.Model.compile メソッドを使用して訓練手続きを configure します。デフォルト引数を持つ tf.keras.optimizers.Adam と損失関数を使用します。

model.compile(optimizer='adam', loss=loss)

チェックポイントを構成する

チェックポイントが訓練の間にセーブされることを確実にするために tf.keras.callbacks.ModelCheckpoint を使用します :

# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

訓練を実行する

訓練時間を合理的なものに保持するために、モデルを訓練するために 10 エポックを使用します。

EPOCHS=10

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
172/172 [==============================] - 8s 47ms/step - loss: 2.5719
Epoch 2/10
172/172 [==============================] - 7s 39ms/step - loss: 1.8664
Epoch 3/10
172/172 [==============================] - 7s 43ms/step - loss: 1.6230
Epoch 4/10
172/172 [==============================] - 7s 43ms/step - loss: 1.4920
Epoch 5/10
172/172 [==============================] - 7s 40ms/step - loss: 1.4106
Epoch 6/10
172/172 [==============================] - 7s 40ms/step - loss: 1.3502
Epoch 7/10
172/172 [==============================] - 7s 40ms/step - loss: 1.2989
Epoch 8/10
172/172 [==============================] - 7s 40ms/step - loss: 1.2519
Epoch 9/10
172/172 [==============================] - 7s 41ms/step - loss: 1.2077
Epoch 10/10
172/172 [==============================] - 7s 40ms/step - loss: 1.1659

テキストを生成する

予測ループ

次のコードブロックがテキストを生成します :

それは開始文字列を選択し、RNN 状態を初期化してそして生成する文字数を設定することから始まります。
開始文字列と RNN 状態を使用して次の文字の予測分布を得ます。
それから、予測された文字のインデックスを計算するために categorical 分布を使用します。この予測された文字をモデルへの次の入力として使用します。
モデルにより返された RNN 状態はモデルに供給し戻されます、その結果それは今ではただ一つの単語よりも多くのコンテキストを代わりに持ちます。次の単語を予測した後、変更された RNN 状態が再度モデルに供給し戻されます、これがそれが前に予測された単語からより多くのコンテキストを得るときにどのように学習するかです。

生成されたテキストを見れば、モデルがいつ大文字にするべきかを知り、パラグラフを作成してそして Shakespeare 的な著述語彙を模倣するのを見るでしょう。小さい数の訓練エポックでは、それは首尾一貫したセンテンスを形成することをまだ学習していません。

def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: where he skill
That ever we fear their looks.

LEONTES:
S stooph of ournd
The dukes o' the selferur is it of her;
Forses thereof willingly no wealth!

TRANIO:
Mine ears, sir, to them speak. whth ever young with one there her to
do than thy heart? There is a fie,
The reverse honoural gives my dather's death
That waves we'll sovereignty of thy aughts,
Your keels with us, save axhousinous: gentle brifferent, for an intent
I did forbilling ougly age on my sin,
With such man est in thy father's father's sky,
As tells me, Yorknishall and Prittend Henry, and her part withan this place
I'll speak of soldier; then I was; to report on
how horse spokes ready the nie, I knew thee,
Pray you, indeed. You were not 'quirence will wear-fallentut and frazed.

YORK:
Is no beg-of this to the Tower.

KATHARINA:
Here is a war.

Third Citizen:
You have professed them and crse trunk blow his chlician.

QUEEN ELIZABETH:
So sovereour a gentlemen, 'tis a book-so sort,
Amen, unwilling,
your love and wars their pa

結果を改善するために貴方ができる最も容易なことはそれをより長く訓練することです (EPOCHS=30 を試してください)。

貴方はまた異なる開始文字列で実験したり、モデル精度を改良するためにもう一つの RNN 層を追加して試す、あるいは多少のランダム予測を生成するために temperature パラメータを調整することができます。

Advanced: カスタマイズされた訓練

上の訓練手続きは単純ですが、多くの制御を貴方に与えません。

そこでモデルを手動でどのように実行するかを見た今、訓練ループをアンパックしてそれを私達自身で実装しましょう。これは例えば、モデルの open-loop 出力を安定させる手助けをするためにカリキュラム学習を実装するための開始点を与えます。

勾配を追跡するために tf.GradientTape を使用します。eager execution ガイドを読むことによりこのアプローチについてより多く学習できます。

手続きは次のように動作します :

最初に、RNN 状態を初期化します。tf.keras.Model.reset_states メソッドを呼び出すことによりこれを行ないます。
次に、データセットに渡り (バッチ毎に) iterate して各々に関連する予測を計算します。
tf.GradientTape をオープンして、そしてそのコンテキストで予測と損失を計算します。
tf.GradientTape.grads メソッドを使用してモデル変数に関する損失の勾配を計算します。
最後に、optimizer の tf.train.Optimizer.apply_gradients メソッドを使用してステップを下方に取ります。

model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

optimizer = tf.keras.optimizers.Adam()

@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(
            target, predictions, from_logits=True))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  return loss

# Training step
EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = model.reset_states()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

WARNING: Logging before flag parsing goes to stderr.
W0628 06:41:29.934798 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer
W0628 06:41:29.936547 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter
W0628 06:41:29.937477 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_1
W0628 06:41:29.938759 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_2
W0628 06:41:29.939390 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer.decay
W0628 06:41:29.940099 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer.learning_rate
W0628 06:41:29.940868 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.embeddings
W0628 06:41:29.941582 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.kernel
W0628 06:41:29.942270 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.bias
W0628 06:41:29.944056 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.kernel
W0628 06:41:29.944905 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.recurrent_kernel
W0628 06:41:29.945559 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.bias
W0628 06:41:29.946222 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.embeddings
W0628 06:41:29.947816 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.kernel
W0628 06:41:29.948982 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.bias
W0628 06:41:29.949806 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.kernel
W0628 06:41:29.950479 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.recurrent_kernel
W0628 06:41:29.951171 140162012227328 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.bias
W0628 06:41:29.952606 140162012227328 util.py:252] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.

Epoch 1 Batch 0 Loss 4.172821998596191
Epoch 1 Batch 100 Loss 2.391324758529663
Epoch 1 Loss 2.0715
Time taken for 1 epoch 9.32474422454834 sec

Epoch 2 Batch 0 Loss 2.1118874549865723
Epoch 2 Batch 100 Loss 1.9127529859542847
Epoch 2 Loss 1.7389
Time taken for 1 epoch 6.282292127609253 sec

Epoch 3 Batch 0 Loss 1.7284945249557495
Epoch 3 Batch 100 Loss 1.6665358543395996
Epoch 3 Loss 1.5530
Time taken for 1 epoch 6.350098133087158 sec

Epoch 4 Batch 0 Loss 1.5307190418243408
Epoch 4 Batch 100 Loss 1.528407335281372
Epoch 4 Loss 1.4468
Time taken for 1 epoch 6.355969429016113 sec

Epoch 5 Batch 0 Loss 1.4143530130386353
Epoch 5 Batch 100 Loss 1.451310396194458
Epoch 5 Loss 1.3683
Time taken for 1 epoch 6.345163583755493 sec

Epoch 6 Batch 0 Loss 1.3415405750274658
Epoch 6 Batch 100 Loss 1.3894364833831787
Epoch 6 Loss 1.3122
Time taken for 1 epoch 6.310795068740845 sec

Epoch 7 Batch 0 Loss 1.2863143682479858
Epoch 7 Batch 100 Loss 1.3341155052185059
Epoch 7 Loss 1.2637
Time taken for 1 epoch 6.189915895462036 sec

Epoch 8 Batch 0 Loss 1.2400826215744019
Epoch 8 Batch 100 Loss 1.2844866514205933
Epoch 8 Loss 1.2265
Time taken for 1 epoch 6.186691999435425 sec

Epoch 9 Batch 0 Loss 1.2004377841949463
Epoch 9 Batch 100 Loss 1.2391895055770874
Epoch 9 Loss 1.1848
Time taken for 1 epoch 6.293030500411987 sec

Epoch 10 Batch 0 Loss 1.1650713682174683
Epoch 10 Batch 100 Loss 1.19230055809021
Epoch 10 Loss 1.1492
Time taken for 1 epoch 6.376950740814209 sec

以上

2019年7月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31