TensorFlow 2.0 Alpha : 上級 Tutorials : テキストとシークエンス :- RNN でテキスト生成 (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 04/06/2019

* 本ページは、TensorFlow の本家サイトの TF 2.0 Alpha – Advanced Tutorials – Text and sequences の以下のページを翻訳した上で適宜、補足説明したものです：

Text generation with an RNN

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

テキストとシークエンス :- RNN でテキスト生成

このチュートリアルは文字ベース RNN を使用してテキストをどのように生成するかを実演します。私達は Andrej Karpathy の The Unreasonable Effectiveness of Recurrent Neural Networks から Shakespeare の著作のデータセットで作業します。このデータ (“Shakespear”) から文字のシークエンスが与えられたとき、シークエンスの次の文字 (“e”) を予測するモデルを訓練します。モデルを繰り返し呼び出すことによりテキストのより長いシークエンスが生成できます。

Note: このノートブックを高速に実行するために GPU アクセラレーションを有効にしてください。Colab では : Runtime > Change runtime type > Hardware acclerator > GPU です。ローカルで実行する場合 TensorFlow version >= 1.11 を確実にしてください。

このチュートリアルは tf.keras と eager execution を使用して実装された実行可能なコードを含みます。次は 30 エポックの間訓練されたこのチュートリアルのモデルが文字列 (= string) “Q” で開始されたときのサンプル出力です :

QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m

センテンスの幾つかが文法的に正しい一方で、多くは意味を成しません。モデルは単語の意味を学習していませんが、以下を考えてください :

私達のモデルは文字ベースです訓練を始めたとき、モデルは英単語をどのようにスペルするか、あるいは単語がテキストのユニットであることさえ知りませんでした。
出力の構造は演劇に類似しています — テキストのブロックは一般に話者名で始まり、元のデータセットと同様に総て大文字です。
下で実演されるように、モデルはテキストの小さいバッチ (それぞれ 100 文字) 上で訓練され、コヒーレント構造を持つテキストのより長いシークエンスを依然として生成することが可能です。

Setup

TensorFlow と他のライブラリをインポートする

from __future__ import absolute_import, division, print_function

!pip install -q tensorflow-gpu==2.0.0-alpha0
import tensorflow as tf

import numpy as np
import os
import time

Collecting tensorflow-gpu==2.0.0-alpha0
Successfully installed google-pasta-0.1.4 tb-nightly-1.14.0a20190303 tensorflow-estimator-2.0-preview-1.14.0.dev2019030300 tensorflow-gpu==2.0.0-alpha0-2.0.0.dev20190303

Shakespeare データセットをダウンロードする

貴方自身のデータ上でこのコードを実行するためには次の行を変更します。

path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
1122304/1115394 [==============================] - 0s 0us/step

データを読む

最初にテキストを少し見てみます :

# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))

Length of text: 1115394 characters

# Take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

# The unique characters in the file
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))

65 unique characters

テキストを処理する

テキストをベクトル化する

訓練前に、文字列を数字表現にマップする必要があります。2 つの検索テーブルを作成します: 一つは文字を数字にマップし、そしてもう一つは数字から文字のためです。

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

今では各文字に対する整数表現を持ちます。文字を 0 から len(unique) のインデックスとしてマップしたことが分かるでしょう。

print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  '\n':   0,
  ' ' :   1,
  '!' :   2,
  '$' :   3,
  '&' :   4,
  "'" :   5,
  ',' :   6,
  '-' :   7,
  '.' :   8,
  '3' :   9,
  ':' :  10,
  ';' :  11,
  '?' :  12,
  'A' :  13,
  'B' :  14,
  'C' :  15,
  'D' :  16,
  'E' :  17,
  'F' :  18,
  'G' :  19,
  ...
}

# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))

'First Citizen' ---- characters mapped to int ---- > [18 47 56 57 58  1 15 47 58 47 64 43 52]

予測タスク

文字、あるいは文字のシークエンスが与えられたとき、最もありそうな次の文字は何でしょう？これが私達がモデルを訓練して遂行するタスクです。モデルへの入力は文字のシークエンスで、出力 — 各時間ステップにおける次の文字を予測するためにモデルを訓練します。RNN は前に見た要素に依拠する内部状態を維持しますので、この瞬間までに計算された総ての文字が与えられたとき、次の文字は何でしょう？

訓練サンプルとターゲットを作成する

次にテキストをサンプル・シークエンスに分割します。各入力シークエンスはテキストから seq_length 文字を含みます。

各入力シークエンスに対して、対応するターゲットはテキストの同じ長さを含みます、一つの文字が右にシフトされていることを除いて。

そこでテキストを seq_length+1のチャンクに分解します。例えば、seq_length が 4 そしてテキストが “Hello” であるとします。入力シークエンスは “Hell” で、そしてターゲット・シークエンスは “ello” です。

これを行なうために最初にテキストベクトルを文字インデックスのストリームに変換するために tf.data.Dataset.from_tensor_slices 関数を使用します。

# The maximum length sentence we want for a single input in characters
seq_length = 100
examples_per_epoch = len(text)//seq_length

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

F
i
r
s
t

batch メソッドはこれらの個々の文字を望まれるサイズのシークエンスに容易に変換させます。

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'

各シークエンスについて、各バッチに単純な関数を適用するために map メソッドを使用して入力とターゲットテキストを形成するためにそれを複製してシフトします :

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

最初のサンプル入力とターゲット値をプリントします :

for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '

これらのベクトルの各インデックスは一つの時間ステップとして処理されます。時間ステップ 0 における入力については、モデルは “F” のためのインデックスを受け取りそして次の文字として “i” のためのインデックスを予測しようとします。次の時間ステップでは、それは同じことを行ないますが RNN は現在の入力文字に加えて前のステップのコンテキストも考慮します。

for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 18 ('F')
  expected output: 47 ('i')
Step    1
  input: 47 ('i')
  expected output: 56 ('r')
Step    2
  input: 56 ('r')
  expected output: 57 ('s')
Step    3
  input: 57 ('s')
  expected output: 58 ('t')
Step    4
  input: 58 ('t')
  expected output: 1 (' ')

訓練バッチを作成する

テキストを扱いやすいシークエンスに分割するために tf.data を使用しました。しかしこのデータをモデルに供給する前に、データをシャッフルしてそれをバッチにパックする必要があります。

# Batch size 
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences, 
# so it doesn't attempt to shuffle the entire sequence in memory. Instead, 
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>

モデルを構築する

モデルを定義するために tf.keras.Sequential を使用します。この単純なサンプルのためにモデルを定義するために 3 つの層が使用されます :

tf.keras.layers.Embedding: 入力層。各文字の数字を embedding_dim 次元を持つベクトルにマップする訓練可能な検索テーブルです;
tf.keras.layers.GRU: サイズ units=rnn_units を持つ RNN の型です (ここで LSTM 層を使用することもできます)。
tf.keras.layers.Dense: vocab_size 出力を持つ出力層。

# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension 
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, 
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units, 
                        return_sequences=True, 
                        stateful=True, 
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

model = build_model(
  vocab_size = len(vocab), 
  embedding_dim=embedding_dim, 
  rnn_units=rnn_units, 
  batch_size=BATCH_SIZE)

WARNING: Logging before flag parsing goes to stderr.
W0304 03:48:46.706135 140067035297664 tf_logging.py:161] <tensorflow.python.keras.layers.recurrent.UnifiedLSTM object at 0x7f637273ccf8>: Note that this layer is not optimized for performance. Please use tf.keras.layers.CuDNNLSTM for better performance on GPU.

各文字についてモデルは埋め込みを検索し、埋め込みを入力として 1 時間ステップ GRU を実行し、そして次の文字の log-尤度を予測するロジットを生成するために dense 層を適用します :

モデルを試す

さてモデルをそれが期待どおりに挙動するかを見るために実行します。

最初に出力の shape を確認します :

for input_example_batch, target_example_batch in dataset.take(1): 
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 65) # (batch_size, sequence_length, vocab_size)

上の例では入力のシークエンス長は 100 ですがモデルは任意の長さの入力上で実行できます :

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           16640     
_________________________________________________________________
unified_lstm (UnifiedLSTM)   (64, None, 1024)          5246976   
_________________________________________________________________
dense (Dense)                (64, None, 65)            66625     
=================================================================
Total params: 5,330,241
Trainable params: 5,330,241
Non-trainable params: 0
_________________________________________________________________

モデルから実際の予測を得るためには、実際の文字インデックスを得るために出力分布からサンプリングする必要があります。この分布は文字語彙に渡るロジットにより定義されます。

Note: この分布からサンプリングすることは重要です、というのは分布の argmax を取ることはループでモデルを容易に stuck させる可能性があるためです。

バッチの最初のサンプルのためにそれを試します :

sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

これは、各時間ステップにおける、次の文字インデックスの予測を与えます :

sampled_indices

array([21,  2, 58, 40, 42, 32, 39,  7, 18, 38, 30, 58, 23, 58, 37, 10, 23,
       16, 52, 14, 43,  8, 32, 49, 62, 41, 53, 38, 17, 36, 24, 59, 41, 38,
        4, 27, 33, 59, 54, 34, 14,  1,  1, 56, 55, 40, 37,  4, 32, 44, 62,
       59,  1, 10, 20, 29,  2, 48, 37, 26, 10, 22, 58,  5, 26,  9, 23, 26,
       54, 43, 46, 36, 62, 57,  8, 53, 52, 23, 57, 42, 60, 10, 43, 11, 45,
       12, 28, 46, 46, 15, 51,  9, 56,  7, 53, 51,  2,  1, 10, 58])

この未訓練モデルにより予測されたテキストを見るためにこれらをデコードします :

print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))

Input: 
 'to it far before thy time?\nWarwick is chancellor and the lord of Calais;\nStern Falconbridge commands'

Next Char Predictions: 
 "I!tbdTa-FZRtKtY:KDnBe.TkxcoZEXLucZ&OUupVB  rqbY&Tfxu :HQ!jYN:Jt'N3KNpehXxs.onKsdv:e;g?PhhCm3r-om! :t"

モデルを訓練する

この時点で問題は標準的な分類問題として扱うことができます。前の RNN 状態、そしてこの時間ステップで入力が与えられたとき、次の文字のクラスを予測します。

optimizer、そして損失関数を張り付ける

この場合では標準的な tf.keras.losses.sparse_softmax_crossentropy 損失関数が動作します、何故ならばそれは予測の最後の次元に渡り適用されるからです。

私達のモデルはロジットを返しますので、from_logits フラグを設定する必要があります。

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)") 
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.174188

tf.keras.Model.compile メソッドを使用して訓練手続きを configure します。デフォルト引数を伴う tf.keras.optimizers.Adam と損失関数を使用します。

model.compile(optimizer='adam', loss=loss)

チェックポイントを構成する

チェックポイントが訓練の間にセーブされることを確実にするために tf.keras.callbacks.ModelCheckpoint を使用します :

# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

訓練を実行する

訓練時間を合理的なものに保持するために、モデルを訓練するために 3 エポック (訳注: 原文ママ) を使用します。

EPOCHS=10

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/10
172/172 [==============================] - 31s 183ms/step - loss: 2.7052
Epoch 2/10
172/172 [==============================] - 31s 180ms/step - loss: 2.0039
Epoch 3/10
172/172 [==============================] - 31s 180ms/step - loss: 1.7375
Epoch 4/10
172/172 [==============================] - 31s 179ms/step - loss: 1.5772
Epoch 5/10
172/172 [==============================] - 31s 179ms/step - loss: 1.4772
Epoch 6/10
172/172 [==============================] - 31s 180ms/step - loss: 1.4087
Epoch 7/10
172/172 [==============================] - 31s 179ms/step - loss: 1.3556
Epoch 8/10
172/172 [==============================] - 31s 179ms/step - loss: 1.3095
Epoch 9/10
172/172 [==============================] - 31s 179ms/step - loss: 1.2671
Epoch 10/10
172/172 [==============================] - 31s 180ms/step - loss: 1.2276

テキストを生成する

予測ループ

次のコードブロックがテキストを生成します :

それは開始文字列を選択し、RNN 状態を初期化してそして生成する文字数を設定することから始まります。
開始文字列と RNN 状態を使用して次の文字の予測分布を得ます。
それから、予測された文字のインデックスを計算するために categorical 分布を使用します。この予測された文字をモデルへの次の入力として使用します。
モデルにより返された RNN 状態はモデルに供給し戻されます、その結果それは今ではただ一つの単語よりも多くのコンテキストを代わりに持ちます。次の単語を予測した後、変更された RNN 状態が再度モデルに供給し戻されます、これがそれが前に予測された単語からより多くのコンテキストを得るときにどのように学習されるかです。

生成されたテキストを見れば、モデルがいつ大文字にするべきかを知り、パラグラフを作成してそして Shakespeare-like な著述語彙を模倣するのを見るでしょう。小さい数の訓練エポックでは、それは首尾一貫したセンテンスを形成することをまだ学習していません。

def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing) 
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
      
      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)
      
      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"ROMEO: "))

ROMEO: now to have weth hearten sonce,
No more than the thing stand perfect your self,
Love way come. Up, this is d so do in friends:
If I fear e this, I poisple
My gracious lusty, born once for readyus disguised:
But that a pry; do it sure, thou wert love his cause;
My mind is come too!

POMPEY:
Serve my master's him: he hath extreme over his hand in the
where they shall not hear they right for me.

PROSSPOLUCETER:
I pray you, mistress, I shall be construted
With one that you shall that we know it, in this gentleasing earls of daiberkers now
he is to look upon this face, which leadens from his master as
you should not put what you perciploce backzat of cast,
Nor fear it sometime but for a pit
a world of Hantua?

First Gentleman:
That we can fall of bastards my sperial;
O, she Go seeming that which I have
what enby oar own best injuring them,
Or thom I do now, I, in heart is nothing gone,
Leatt the bark which was done born.

BRUTUS:
Both Margaret, he is sword of the house person. If born,

結果を改善するために貴方ができる最も容易なことはそれをより長く訓練することです (EPOCHS=30 を試してください)。

貴方はまた異なる開始文字列で実験したり、モデル精度を改良するためにもう一つの RNN 層を追加して試す、あるいは多少のランダム予測を生成するために temperature パラメータを調整することができます。

Advanced: カスタマイズされた訓練

上の訓練手続きは単純ですが、多くの制御を貴方に与えません。

そこでモデルを手動でどのように実行するかを見た今、訓練ループをアンパックしてそれを私達自身で実装しましょう。これは例えば、モデルの open-loop 出力を安定させる手助けをするためにカリキュラム学習を実装するための開始点を与えます。

勾配を追跡するために tf.GradientTape を使用します。eager execution ガイドを読むことによりこのアプローチについてより多く学習できます。

手続きは次のように動作します :

最初に、RNN 状態を初期化します。tf.keras.Model.reset_states メソッドを呼び出すことによりこれを行ないます。
次に、データセットに渡り (バッチ毎に) iterate して各々に関連する予測を計算します。
tf.GradientTape をオープンして、そしてそのコンテキストで予測と損失を計算します。
tf.GradientTape.grads メソッドを使用してモデル変数に関する損失の勾配を計算します。
最期に、optimizer の tf.train.Optimizer.apply_gradients メソッドを使用してステップを下方に取ります。

model = build_model(
  vocab_size = len(vocab), 
  embedding_dim=embedding_dim, 
  rnn_units=rnn_units, 
  batch_size=BATCH_SIZE)

W0304 03:54:08.030432 140067035297664 tf_logging.py:161] <tensorflow.python.keras.layers.recurrent.UnifiedLSTM object at 0x7f63635efe80>: Note that this layer is not optimized for performance. Please use tf.keras.layers.CuDNNLSTM for better performance on GPU.

optimizer = tf.keras.optimizers.Adam()

@tf.function
def train_step(inp, target):
  with tf.GradientTape() as tape:
    predictions = model(inp)
    loss = tf.reduce_mean(
        tf.keras.losses.sparse_categorical_crossentropy(target, predictions))
  grads = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(grads, model.trainable_variables))
  
  return loss

# Training step
EPOCHS = 10

for epoch in range(EPOCHS):
  start = time.time()

  for (batch_n, (inp, target)) in enumerate(dataset):
    loss = train_step(inp, target)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 8.132774353027344
Epoch 1 Batch 100 Loss 3.5028388500213623
Epoch 1 Loss 3.7314
Time taken for 1 epoch 31.78906798362732 sec

Epoch 2 Batch 0 Loss 3.766866445541382
Epoch 2 Batch 100 Loss 3.985184669494629
Epoch 2 Loss 3.9137
Time taken for 1 epoch 29.776747703552246 sec

Epoch 3 Batch 0 Loss 4.023300647735596
Epoch 3 Batch 100 Loss 3.921215534210205
Epoch 3 Loss 3.8976
Time taken for 1 epoch 30.094752311706543 sec

Epoch 4 Batch 0 Loss 3.916696071624756
Epoch 4 Batch 100 Loss 3.900864362716675
Epoch 4 Loss 3.9048
Time taken for 1 epoch 30.09034276008606 sec

Epoch 5 Batch 0 Loss 3.9154434204101562
Epoch 5 Batch 100 Loss 3.9020049571990967
Epoch 5 Loss 3.9725
Time taken for 1 epoch 30.17358922958374 sec

Epoch 6 Batch 0 Loss 3.9781394004821777
Epoch 6 Batch 100 Loss 3.920198917388916
Epoch 6 Loss 3.9269
Time taken for 1 epoch 30.19426202774048 sec

Epoch 7 Batch 0 Loss 3.9400787353515625
Epoch 7 Batch 100 Loss 3.8473968505859375
Epoch 7 Loss 3.8438
Time taken for 1 epoch 30.107476234436035 sec

Epoch 8 Batch 0 Loss 3.852555513381958
Epoch 8 Batch 100 Loss 3.8410544395446777
Epoch 8 Loss 3.8218
Time taken for 1 epoch 30.084821462631226 sec

Epoch 9 Batch 0 Loss 3.843691349029541
Epoch 9 Batch 100 Loss 3.829458236694336
Epoch 9 Loss 3.8420
Time taken for 1 epoch 30.13308310508728 sec

Epoch 10 Batch 0 Loss 3.8553621768951416
Epoch 10 Batch 100 Loss 3.7812960147857666
Epoch 10 Loss 3.7726
Time taken for 1 epoch 30.14617133140564 sec

以上

2019年4月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30