TensorFlow 2.0 : 上級 Tutorials : カスタマイズ :- 自動微分と gradient tape (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 10/28/2019

* 本ページは、TensorFlow org サイトの TF 2.0 – Advanced Tutorials – Customization の以下のページを翻訳した上で適宜、補足説明したものです：

Automatic differentiation and gradient tape

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

★ 無料セミナー開催中 ★ クラスキャット主催人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社公式 Web サイトでご確認頂けます。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/

Facebook: https://www.facebook.com/ClassCatJP/

カスタマイズ :- 自動微分と gradient tape

前のチュートリアルで Tensor とその上の演算を紹介しました。このチュートリアルでは自動微分をカバーします、機械学習モデルを最適化するための主要テクニックです。

Setup

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

Gradient tape

TensorFlow は自動微分のための tf.GradientTape API を提供します – その入力変数に関する計算の勾配を計算します。TensorFlow は tf.GradientTape のコンテキスト内で実行された総ての演算を「テープ」上に「記録」します。それから TensorFlow はリバースモード微分を使用して「記録」された計算の勾配を計算するためにそのテープと各記録された演算に関連する勾配を使用します。

例えば :

x = tf.ones((2, 2))

with tf.GradientTape() as t:
  t.watch(x)
  y = tf.reduce_sum(x)
  z = tf.multiply(y, y)

# Derivative of z with respect to the original input tensor x
dz_dx = t.gradient(z, x)
for i in [0, 1]:
  for j in [0, 1]:
    assert dz_dx[i][j].numpy() == 8.0

「記録」された tf.GradientTape コンテキストの間に計算された中間値に関する出力の勾配を要求することもできます。

x = tf.ones((2, 2))

with tf.GradientTape() as t:
  t.watch(x)
  y = tf.reduce_sum(x)
  z = tf.multiply(y, y)

# Use the tape to compute the derivative of z with respect to the
# intermediate value y.
dz_dy = t.gradient(z, y)
assert dz_dy.numpy() == 8.0

デフォルトでは、GradientTape により保持されたリソースは GradientTape.gradient() メソッドが呼び出されるとすぐに解放されます。同じ計算に渡る複数の勾配を計算するためには、persistent な gradient tape を作成します。これは gradient() メソッドへの複数の呼び出しを可能にします、何故ならば tape オブジェクトがガベージコレクションされるときにリソースは解放されますので。例えば :

x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
  t.watch(x)
  y = x * x
  z = y * y
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
dy_dx = t.gradient(y, x)  # 6.0
del t  # Drop the reference to the tape

制御フローを記録する

テープは演算をそれらが実行されるときに記録しますので、(例えば if と while を使用する) Python 制御フローは自然に処理されます :

def f(x, y):
  output = 1.0
  for i in range(y):
    if i > 1 and i < 5:
      output = tf.multiply(output, x)
  return output

def grad(x, y):
  with tf.GradientTape() as t:
    t.watch(x)
    out = f(x, y)
  return t.gradient(out, x)

x = tf.convert_to_tensor(2.0)

assert grad(x, 6).numpy() == 12.0
assert grad(x, 5).numpy() == 12.0
assert grad(x, 4).numpy() == 4.0

高位勾配

GradientTape コンテキスト・マネージャ内の演算は自動微分のために記録されます。そのコンテキストで勾配が計算される場合、勾配計算もまた記録されます。その結果、正確に同じ API が高位勾配のためにもまた動作します。例えば :

x = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0

with tf.GradientTape() as t:
  with tf.GradientTape() as t2:
    y = x * x * x
  # Compute the gradient inside the 't' context manager
  # which means the gradient computation is differentiable as well.
  dy_dx = t2.gradient(y, x)
d2y_dx2 = t.gradient(dy_dx, x)

assert dy_dx.numpy() == 3.0
assert d2y_dx2.numpy() == 6.0

以上

月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31