TensorFlow 2.0 Beta : ガイド : Eager エッセンシャル (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 06/08/2019 (beta0)

* 本ページは、TensorFlow の本家サイトの TF 2.0 Beta の以下のページを翻訳した上で適宜、補足説明したものです：

Eager essentials

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

ガイド : Eager エッセンシャル

TensorFlow の eager execution は命令型プログラミング環境で、これはグラフを構築することなく演算を直ちに評価します : 演算は後で実行する計算グラフを構築する代わりに具体的な値を返します。これは TensorFlow を始めてモデルをデバッグすることを容易にして、そしてまたそれはボイラープレート (なコード) を削減します。このガイドに沿ってフォローするためには、対話的 python インタープリタで下のコード・サンプルを実行してください。

Eager execution は研究と実験のための柔軟な機械学習プラットフォームで、以下を提供します :

直感的なインターフェイス — 貴方のコードを自然に構造化して Python データ構造を使用します。小さなモデルと小さなデータ上で迅速に iterate します。
より容易なデバッギング — 実行中のモデルを調査して変更をテストするために ops を直接的に呼び出します。即時のエラー報告のために標準的な Python デバッギング・ツールを使用します。
自然な制御フロー — グラフ制御フローの代わりに Python 制御フローを使用し、動的モデルの仕様を単純化します。

Eager execution は殆どの TensorFlow 演算と GPU アクセラレーションをサポートします。

Note: 幾つかのモデルでは eager execution が有効であると増大したオーバーヘッドを経験するかもしれません。パフォーマンス改善は進行中ですが、問題が見つかる場合にはバグをファイルしてベンチマークを共有してください。

セットアップと基本的な使用方法

TensorFlow の最新版にアップグレードします :

from __future__ import absolute_import, division, print_function, unicode_literals

!pip install -q tensorflow==2.0.0-beta0
import tensorflow as tf

import cProfile

TensorFlow 2.0 では、eager execution はデフォルトで有効にされています。

tf.executing_eagerly()

True

今では貴方は TensorFlow 演算を実行できて結果は直ちに返ります :

x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m))

hello, [[4.]]

eager execution を有効にすると TensorFlow 演算がどのように挙動するかを変更します — 今ではそれらは即時に評価して値を Python に返します。tf.Tensor オブジェクトは計算グラフのノードへのシンボリックなハンドルの代わりに具体的な値を参照します。セッション内で構築して後で実行する計算グラフはありませんので、print() やデバッガーを使用して結果を調査することは容易です。tensor 値の評価、出力表示そしてチェックは勾配を計算するためのフローを壊しません。

Eager execution は NumPy と共に素晴らしく動作します。NumPy 演算は tf.Tensor 引数を受け取ります。TensorFlow math 演算は Python オブジェクトと NumPy 配列を tf.Tensor オブジェクトに変換します。tf.Tensor.numpy メソッドはオブジェクトの値を NumPy ndarray として返します。

a = tf.constant([[1, 2],
                 [3, 4]])
print(a)

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)

# Broadcasting support
b = tf.add(a, 1)
print(b)

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)

# Operator overloading is supported
print(a * b)

tf.Tensor(
[[ 2  6]
 [12 20]], shape=(2, 2), dtype=int32)

# Use NumPy values
import numpy as np

c = np.multiply(a, b)
print(c)

[[ 2  6]
 [12 20]]

# Obtain numpy value from a tensor:
print(a.numpy())
# => [[1 2]
#     [3 4]]

[[1 2]
 [3 4]]

動的制御フロー

eager execution の主なメリットはモデルが実行されている間にホスト言語の総ての機能が利用可能であることです。そのため例えば、fizzbuzz を書くことも容易です :

def fizzbuzz(max_num):
  counter = tf.constant(0)
  max_num = tf.convert_to_tensor(max_num)
  for num in range(1, max_num.numpy()+1):
    num = tf.constant(num)
    if int(num % 3) == 0 and int(num % 5) == 0:
      print('FizzBuzz')
    elif int(num % 3) == 0:
      print('Fizz')
    elif int(num % 5) == 0:
      print('Buzz')
    else:
      print(num.numpy())
    counter += 1

fizzbuzz(15)

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz

これは tensor 値に依拠する条件節を持ちこれらの値を実行時に出力表示します。

Eager 訓練

勾配を計算する

自動微分はニューラルネットワークを訓練するためのバックプロパゲーションのような機械学習アルゴリズムを実装するために有用です。eager execution の間は、後で勾配を計算するための演算を追跡するために tfe.GradientTape を利用します。

eager で訓練する and/or 勾配を計算するために tf.GradientTape を使用できます。それは複雑な訓練ループのために特に有用です。

異なる演算が各呼び出しの間に発生しますので、総ての forward パス演算は「テープ」に記録されます。勾配を計算するために、テープを反対に再生してから破棄します。特定の tf.GradientTape は 1 つの勾配を計算するだけです ; 続く呼び出しはランタイム・エラーを投げます。

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

tf.Tensor([[2.]], shape=(1, 1), dtype=float32)

モデルを訓練する

次のサンプルは標準的な MNIST 手書き数字を分類する多層モデルを作成します。eager execution 環境で訓練可能なグラフを構築する optimizer と層 API を示します。

# Fetch and format the mnist data
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()

dataset = tf.data.Dataset.from_tensor_slices(
  (tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32),
   tf.cast(mnist_labels,tf.int64)))
dataset = dataset.shuffle(1000).batch(32)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

# Build the model
mnist_model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16,[3,3], activation='relu',
                         input_shape=(None, None, 1)),
  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(10)
])

訓練なしでさえも、eager execution ではモデルを呼び出して出力を調査します :

for images,labels in dataset.take(1):
  print("Logits: ", mnist_model(images[0:1]).numpy())

Logits:  [[-0.02539074 -0.01439482  0.00780122 -0.00887529 -0.01578783  0.02660074
   0.03275762 -0.01570328 -0.03225745  0.0271067 ]]

keras モデルが (fit メソッドを使用した) 組み込み訓練ループを持つ一方、時には更なるカスタマイズが必要でしょう。ここに、eager で実装された訓練ループのサンプルがあります :

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_history = []

Note: 条件が成り立つかを確認するために tf.debugging の assert 関数を使用してください。これは eager と graph execution で動作します。

def train_step(images, labels):
  with tf.GradientTape() as tape:
    logits = mnist_model(images, training=True)
    
    # Add asserts to check the shape of the output.
    tf.debugging.assert_equal(logits.shape, (32, 10))
    
    loss_value = loss_object(labels, logits)

  loss_history.append(loss_value.numpy().mean())
  grads = tape.gradient(loss_value, mnist_model.trainable_variables)
  optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))

def train():
  for epoch in range(3):
    for (batch, (images, labels)) in enumerate(dataset):
      train_step(images, labels)
    print ('Epoch {} finished'.format(epoch))

train()

Epoch 0 finished
Epoch 1 finished
Epoch 2 finished

import matplotlib.pyplot as plt

plt.plot(loss_history)
plt.xlabel('Batch #')
plt.ylabel('Loss [entropy]')

Text(0, 0.5, 'Loss [entropy]')

Variable と optimizer

tf.Variable オブジェクトは自動微分をより容易にするために訓練の間にアクセスされるミュータブルな tf.Tensor 値をストアします。モデルのパラメータはクラス内に変数としてカプセル化できます。

モデル・パラメータは tf.Variable を tf.GradientTape と共に使用することでより良くカプセル化できます。例えば、上の自動微分サンプルは次のように書き換えることができます :

class Model(tf.keras.Model):
  def __init__(self):
    super(Model, self).__init__()
    self.W = tf.Variable(5., name='weight')
    self.B = tf.Variable(10., name='bias')
  def call(self, inputs):
    return inputs * self.W + self.B

# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 2000
training_inputs = tf.random.normal([NUM_EXAMPLES])
noise = tf.random.normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

# The loss function to be optimized
def loss(model, inputs, targets):
  error = model(inputs) - targets
  return tf.reduce_mean(tf.square(error))

def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W, model.B])

# Define:
# 1. A model.
# 2. Derivatives of a loss function with respect to model parameters.
# 3. A strategy for updating the variables based on the derivatives.
model = Model()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

print("Initial loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))

# Training loop
for i in range(300):
  grads = grad(model, training_inputs, training_outputs)
  optimizer.apply_gradients(zip(grads, [model.W, model.B]))
  if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(model, training_inputs, training_outputs)))

print("Final loss: {:.3f}".format(loss(model, training_inputs, training_outputs)))
print("W = {}, B = {}".format(model.W.numpy(), model.B.numpy()))

Initial loss: 68.494
Loss at step 000: 65.867
Loss at step 020: 30.344
Loss at step 040: 14.302
Loss at step 060: 7.052
Loss at step 080: 3.773
Loss at step 100: 2.289
Loss at step 120: 1.616
Loss at step 140: 1.311
Loss at step 160: 1.173
Loss at step 180: 1.110
Loss at step 200: 1.082
Loss at step 220: 1.069
Loss at step 240: 1.063
Loss at step 260: 1.060
Loss at step 280: 1.059
Final loss: 1.059
W = 2.968806028366089, B = 1.990594506263733

eager execution の間の状態のためにオブジェクトを使用する

TF 1.x グラフ実行では、(変数のような) プログラム状態はグローバル・コレクションにストアされてそれらのライフタイムは tf.Session オブジェクトで管理されます。対照的に、eager execuction の間は状態オブジェクトのライフタイムはそれらの対応する Python オブジェクトのライフタイムにより決定されます。

Variables はオブジェクト

eager execution の間、variable はオブジェクトへの最後の参照が除去されるまで存続し、それから削除されます。

if tf.test.is_gpu_available():
  with tf.device("gpu:0"):
    v = tf.Variable(tf.random.normal([1000, 1000]))
    v = None  # v no longer takes up GPU memory

オブジェクト・ベースのセービング

このセクションは訓練チェックポイントへのガイドの短縮されたバージョンです。

tf.train.Checkpoint は tf.Variables をチェックポイントへ/からセーブしてリストアできます :

x = tf.Variable(10.)
checkpoint = tf.train.Checkpoint(x=x)

x.assign(2.)   # Assign a new value to the variables and save.
checkpoint_path = './ckpt/'
checkpoint.save('./ckpt/')

'./ckpt/-1'

x.assign(11.)  # Change the variable after saving.

# Restore values from the checkpoint
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_path))

print(x)  # => 2.0

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>

モデルをセーブしてロードするためには、隠れ変数を必要とすることなく、tf.train.Checkpoint はオブジェクトの内部状態をストアします。モデル、optimizer そしてグローバルステップの状態を記録するためには、それらを tf.train.Checkpoint に渡します :

import os

model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(10)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
checkpoint_dir = 'path/to/model_dir'
if not os.path.exists(checkpoint_dir):
  os.makedirs(checkpoint_dir)
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
root = tf.train.Checkpoint(optimizer=optimizer,
                           model=model)

root.save(checkpoint_prefix)
root.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f9976568828>

Note: 多くの訓練ループで、variables は tf.train.Checkpoint.restore が呼び出された後で作成されます。これらの variables はそれらが作成されるとすぐにリストアされて、チェックポイントが完全にロードされたかを確かなものにするために assertions が利用可能です。詳細は訓練チェックポイントへのガイドを見てください。

オブジェクト指向メトリクス

tf.keras.metrics はオブジェクトとしてストアされます。新しいデータを callable に渡すことでメトリクスを更新して、そして tf.keras.metrics.result メソッドを使用して結果を取得します、例えば :

m = tf.keras.metrics.Mean("loss")
m(0)
m(5)
m.result()  # => 2.5
m([8, 9])
m.result()  # => 5.5

<tf.Tensor: id=1036628, shape=(), dtype=float32, numpy=5.5>

上級者のための自動微分トピック

動的モデル

tf.GradientTape はまた動的モデルでも利用できます。バックトラックする直線探索 (= line search) アルゴリズムのためのこのサンプルは、複雑な制御フローにもかかわらず、(勾配があり微分可能であることを除けば) 普通の NumPy コードのように見えます :

def line_search_step(fn, init_x, rate=1.0):
  with tf.GradientTape() as tape:
    # Variables are automatically recorded, but manually watch a tensor
    tape.watch(init_x)
    value = fn(init_x)
  grad = tape.gradient(value, init_x)
  grad_norm = tf.reduce_sum(grad * grad)
  init_value = value
  while value > init_value - rate * grad_norm:
    x = init_x - rate * grad
    value = fn(x)
    rate /= 2.0
  return x, value

カスタム勾配

カスタム勾配は勾配を override するための簡単な方法です。forward 関数内では、勾配は入力、出力、または中間結果に関する勾配を定義します。例えば、backward パスで勾配のノルムをクリップするための簡単な方法がここにあります :

@tf.custom_gradient
def clip_gradient_by_norm(x, norm):
  y = tf.identity(x)
  def grad_fn(dresult):
    return [tf.clip_by_norm(dresult, norm), None]
  return y, grad_fn

カスタム勾配は演算のシークエンスのための数値的安定な勾配を提供するために一般に使用されます :

def log1pexp(x):
  return tf.math.log(1 + tf.exp(x))

def grad_log1pexp(x):
  with tf.GradientTape() as tape:
    tape.watch(x)
    value = log1pexp(x)
  return tape.gradient(value, x)

# The gradient computation works fine at x = 0.
grad_log1pexp(tf.constant(0.)).numpy()

0.5

# However, x = 100 fails because of numerical instability.
grad_log1pexp(tf.constant(100.)).numpy()

nan

ここで、log1pexp 関数はカスタム勾配で解析的に単純化できます。下の実装は forward パスの間に計算された tf.exp(x) のための値を再利用しています — 冗長な計算を除去することでそれをより効率的にしています :

@tf.custom_gradient
def log1pexp(x):
  e = tf.exp(x)
  def grad(dy):
    return dy * (1 - 1 / (1 + e))
  return tf.math.log(1 + e), grad

def grad_log1pexp(x):
  with tf.GradientTape() as tape:
    tape.watch(x)
    value = log1pexp(x)
  return tape.gradient(value, x)

# As before, the gradient computation works fine at x = 0.
grad_log1pexp(tf.constant(0.)).numpy()

0.5

# And the gradient computation also works at x = 100.
grad_log1pexp(tf.constant(100.)).numpy()

1.0

パフォーマンス

eager execution の間は計算は GPU へと自動的にオフロードされます。計算が実行される場所について制御を望む場合にはそれを tf.device(‘/gpu:0’) ブロック (または CPU の同値のもの) で囲むことができます :

import time

def measure(x, steps):
  # TensorFlow initializes a GPU the first time it's used, exclude from timing.
  tf.matmul(x, x)
  start = time.time()
  for i in range(steps):
    x = tf.matmul(x, x)
  # tf.matmul can return before completing the matrix multiplication
  # (e.g., can return after enqueing the operation on a CUDA stream).
  # The x.numpy() call below will ensure that all enqueued operations
  # have completed (and will also copy the result to host memory,
  # so we're including a little more than just the matmul operation
  # time).
  _ = x.numpy()
  end = time.time()
  return end - start

shape = (1000, 1000)
steps = 200
print("Time to multiply a {} matrix by itself {} times:".format(shape, steps))

# Run on CPU:
with tf.device("/cpu:0"):
  print("CPU: {} secs".format(measure(tf.random.normal(shape), steps)))

# Run on GPU, if available:
if tf.test.is_gpu_available():
  with tf.device("/gpu:0"):
    print("GPU: {} secs".format(measure(tf.random.normal(shape), steps)))
else:
  print("GPU: not found")

Time to multiply a (1000, 1000) matrix by itself 200 times:
CPU: 0.963911771774292 secs
GPU: 0.03915715217590332 secs

tf.Tensor オブジェクトはその演算を実行するために異なるデバイスへとコピーできます :

if tf.test.is_gpu_available():
  x = tf.random.normal([10, 10])

  x_gpu0 = x.gpu()
  x_cpu = x.cpu()

  _ = tf.matmul(x_cpu, x_cpu)    # Runs on CPU
  _ = tf.matmul(x_gpu0, x_gpu0)  # Runs on GPU:0

ベンチマーク

GPU 上の ResNet50 訓練のような計算が重いモデルについては、eager execution パフォーマンスは tf.function 実行に匹敵します。しかしこの隔たりはより少ない計算を持つモデルのためにはより大きくなり、多くの小さい演算を持つモデルのためにホットコード・パスを最適化するために行われなければならない作業があります。

Work with functions

eager execution が開発とデバッグをより対話的にする一方で、TensorFlow 1.x スタイルのグラフ execution は分散訓練、パフォーマンス最適化、そしてプロダクション配備のために優位点を持ちます。この隔たりを埋めるために、TensorFlow 2.0 は tf.function API を通した関数を導入します。より多くの情報は、Autograph ガイドを見てください。

以上

2019年6月
月	火	水	木	金	土	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30