TensorFlow 2.4 : ガイド : 基本 – TensorFlow 上の NumPy API (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 01/08/2021

* 本ページは、TensorFlow org サイトの Guide – TensorFlow Basics の以下のページを翻訳した上で
適宜、補足説明したものです：

NumPy API on TensorFlow

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

★ 無料セミナー実施中 ★ クラスキャット主催人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社公式 Web サイトでご確認頂けます。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。

クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :

人工知能研究開発支援	人工知能研修サービス	テレワーク & オンライン授業を支援
PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/

Facebook: https://www.facebook.com/ClassCatJP/

ガイド : 基本 – TensorFlow 上の NumPy API

概要

TensorFlow は、tf.experimental.numpy として利用可能な、NumPy API のサブセットを実装しています。これは TensorFlow によりアクセラレートされた、NumPy コードを実行することを可能にする一方で、TensorFlow の API の総てへのアクセスも可能にします。

セットアップ

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow.experimental.numpy as tnp
import timeit

print("Using TensorFlow version %s" % tf.__version__)

Using TensorFlow version 2.4.0

TensorFlow NumPy ND 配列

ND 配列 と呼ばれる、tf.experimental.numpy.ndarray のインスタンスはあるデバイスに配置される与えられた dtype の多次元密配列を表します。これらのオブジェクトの各一つは内部的には tf.Tensor をラップしています。ndarray.T, ndarray.reshape, ndarray.ravel とその他のような有用なメソッドのための ND 配列クラスを確認してください。

最初に ND 配列オブジェクトを作成し、それから様々なメソッドを起動します。

# Create an ND array and check out different attributes.
ones = tnp.ones([5, 3], dtype=tnp.float32)
print("Created ND array with shape = %s, rank = %s, "
      "dtype = %s on device = %s\n" % (
          ones.shape, ones.ndim, ones.dtype, ones.data.device))

# Check out the internally wrapped `tf.Tensor` object.
print("The ND array wraps a tf.Tensor: %s\n" % ones.data)

# Try commonly used member functions.
print("ndarray.T has shape %s" % str(ones.T.shape))
print("narray.reshape(-1) has shape %s" % ones.reshape(-1).shape)

Created ND array with shape = (5, 3), rank = 2, dtype = float32 on device = /job:localhost/replica:0/task:0/device:CPU:0

The ND array wraps a tf.Tensor: tf.Tensor(
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]], shape=(5, 3), dtype=float32)

ndarray.T has shape (3, 5)
narray.reshape(-1) has shape 15

タイプ・プロモーション

TensorFlow NumPy API はリテラルを ND 配列に変換し、そして ND 配列上のタイプ・プロモーションを遂行するための well-defined なセマンティクスを持ちます。より多くの詳細については np.result_type を見てください。リテラルを ND 配列に変換するとき、NumPy は tnp.int64 と tnp.float64 のような幅広いタイプを選択します。

対称的に、tf.convert_to_tensor は定数を tf.Tensor に変換するために tf.int32 と tf.float32 タイプを選択します。TensorFlow API は tf.Tensor 入力を変更されないままにしてそれらの上のタイプ・プロモーションを遂行しません。

次のサンプルでは、タイプ・プロモーションを遂行します。最初に、異なるタイプの ND 配列入力上の加算を実行して出力タイプを記します。これらのタイプ・プロモーションは単純な tf.Tensor オブジェクト上では許容されないでしょう。最後に、リテラルを ndarray.asarrray を使用して ND 配列に変換して結果のタイプを記します。

print("Type promotion for operations")
values = [tnp.asarray(1, dtype=d) for d in
          (tnp.int32, tnp.int64, tnp.float32, tnp.float64)]
for i, v1 in enumerate(values):
  for v2 in values[i+1:]:
    print("%s + %s => %s" % (v1.dtype, v2.dtype, (v1 + v2).dtype))

print("Type inference during array creation")
print("tnp.asarray(1).dtype == tnp.%s" % tnp.asarray(1).dtype)
print("tnp.asarray(1.).dtype == tnp.%s\n" % tnp.asarray(1.).dtype)

Type promotion for operations
int32 + int64 => int64
int32 + float32 => float64
int32 + float64 => float64
int64 + float32 => float64
int64 + float64 => float64
float32 + float64 => float64
Type inference during array creation
tnp.asarray(1).dtype == tnp.int64
tnp.asarray(1.).dtype == tnp.float64

ブロードキャスト

TensorFlow と同様に、NumPy は「ブロードキャスト」値のためのリッチなセマンティクスを定義します。より多くの情報について NumPy ブロードキャスト・ガイドを確認してこれを TensorFlow ブロードキャスト・セマンティクスと比較することができます。

x = tnp.ones([2, 3])
y = tnp.ones([3])
z = tnp.ones([1, 2, 1])
print("Broadcasting shapes %s, %s and %s gives shape %s" % (
    x.shape, y.shape, z.shape, (x + y + z).shape))

Broadcasting shapes (2, 3), (3,) and (1, 2, 1) gives shape (1, 2, 3)

インデキシング

NumPy は非常に洗練されたインデキシング規則を定義します。NumPy インデキシング・ガイドを見てください。下のインデックスとしての ND 配列の使用に注意してください。

x = tnp.arange(24).reshape(2, 3, 4)

print("Basic indexing")
print(x[1, tnp.newaxis, 1:3, ...], "\n")

print("Boolean indexing")
print(x[:, (True, False, True)], "\n")

print("Advanced indexing")
print(x[1, (0, 0, 1), tnp.asarray([0, 1, 1])])

Basic indexing
ndarray<tf.Tensor(
[[[16 17 18 19]
  [20 21 22 23]]], shape=(1, 2, 4), dtype=int64)> 

Boolean indexing
ndarray<tf.Tensor(
[[[ 0  1  2  3]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [20 21 22 23]]], shape=(2, 2, 4), dtype=int64)> 

Advanced indexing
ndarray<tf.Tensor([12 13 17], shape=(3,), dtype=int64)>

# Mutation is currently not supported
try:
  tnp.arange(6)[1] = -1
except TypeError:
  print("Currently, TensorFlow NumPy does not support mutation.")

Currently, TensorFlow NumPy does not support mutation.

サンプル・モデル

次に、モデルをどのように作成してその上で推論を実行するかを見れます。この単純な層は relu 層に続いて線形射影を適用します。後のセクションで TensorFlow の GradientTape を使用してこのモデルのための勾配をどのように計算するかを示します。

class Model(object):
  """Model with a dense and a linear layer."""

  def __init__(self):
    self.weights = None

  def predict(self, inputs):
    if self.weights is None:
      size = inputs.shape[1]
      # Note that type `tnp.float32` is used for performance.
      stddev = tnp.sqrt(size).astype(tnp.float32)
      w1 = tnp.random.randn(size, 64).astype(tnp.float32) / stddev
      bias = tnp.random.randn(64).astype(tnp.float32)
      w2 = tnp.random.randn(64, 2).astype(tnp.float32) / 8
      self.weights = (w1, bias, w2)
    else:
      w1, bias, w2 = self.weights
    y = tnp.matmul(inputs, w1) + bias
    y = tnp.maximum(y, 0)  # Relu
    return tnp.matmul(y, w2)  # Linear projection

model = Model()
# Create input data and compute predictions.
print(model.predict(tnp.ones([2, 32], dtype=tnp.float32)))

ndarray<tf.Tensor(
[[-0.31255645  0.00103381]
 [-0.31255645  0.00103381]], shape=(2, 2), dtype=float32)>

TensorFlow NumPy と NumPy

TensorFlow NumPy は完全な NumPy 仕様のサブセットを実装しています。より多くのシンボルが時間とともに追加されるであろう一方で、近い将来にはサポートされないシステマティックな特徴もあります。これらは NumPy C API サポート、Swig 統合、Fortran ストレージ順序、ビューと stride_tricks そして (np.recarray と np.object のような) 幾つかの dtype を含みます。より多くの詳細については、TensorFlow NumPy API ドキュメントを見てください。

NumPy 相互運用性

TensorFlow ND 配列は NumPy 関数と相互運用できます。これらのオブジェクトは __array__ インターフェイスを実装します。NumPy は関数引数をそれらを処理する前に np.ndarray 値に変換するためにこのインターフェイスを仕様します。

同様に、TensorFlow NumPy 関数は tf.Tensor と np.ndarray を含む様々なタイプの入力を受け取ることができます。これらの入力はそれらの上で ndarray.asarray を呼び出すことにより ND 配列に変換されます。

ND 配列の np.ndarry へのそして (np.ndarray) からの変換は実際のデータコピーを引き起こすかもしれません。より多くの詳細についてはバッファコピーのセクションを見てください。

# ND array passed into NumPy function.
np_sum = np.sum(tnp.ones([2, 3]))
print("sum = %s. Class: %s" % (float(np_sum), np_sum.__class__))

# `np.ndarray` passed into TensorFlow NumPy function.
tnp_sum = tnp.sum(np.ones([2, 3]))
print("sum = %s. Class: %s" % (float(tnp_sum), tnp_sum.__class__))

sum = 6.0. Class: <class 'numpy.float64'>
sum = 6.0. Class: <class 'tensorflow.python.ops.numpy_ops.np_arrays.ndarray'>

# It is easy to plot ND arrays, given the __array__ interface.
labels = 15 + 2 * tnp.random.randn(1000)
_ = plt.hist(labels)

バッファコピー

TensorFlow NumPy を NumPy コードと混在させることはデータコピーを発生させるかもしれません。これは TensorFlow NumPy はメモリ・アラインメントについて NumPy のそれらよりも厳しい要件を持つからです。

np.ndarray が TensorFlow NumPy に渡されるとき、それはアラインメント要求をチェックして必要であればコピーを引き起こします。ND 配列 CPU バッファを NumPy に渡すとき、一般的にはバッファはアラインメント要件を満たして NumPy はコピーを作成する必要はありません。

ND 配列はローカル CPU メモリではなくデバイス上に置かれたバッファを参照することができます。そのような場合、NumPy 関数の起動は必要に応じてネットワークやデバイスに渡るコピーを引き起こします。

この仮定で、NumPy API 呼び出しとの混在は一般的には警告 (= caution) とともに成されそしてユーザはデータのコピーのオーバーヘッドに注意するべきです。TensorFlow NumPy 呼び出しを TensorFlow 呼び出しと交互配置する (= interleave) ことは一般に安全でデータこコピーを回避します。より多くの詳細については tensorflow 相互運用性のセクションを見てください。

演算子優先度

TensorFlow NumPy は __array_priority__ を NumPy のものよりも高く定義します。これは、ND 配列と np.ndarray の両者を巻き込む演算子について、前者が優先権を取ることを意味します、i.e. np.ndarray 入力は ND 配列に変換されそして演算子の TensorFlow NumPy 実装が起動されます。

x = tnp.ones([2]) + np.ones([2])
print("x = %s\nclass = %s" % (x, x.__class__))

x = ndarray<tf.Tensor([2. 2.], shape=(2,), dtype=float64)>
class = <class 'tensorflow.python.ops.numpy_ops.np_arrays.ndarray'>

TF NumPy と TensorFlow

TensorFlow NumPy は TensorFlow の上に構築されていてそれ故に TensorFlow とシームレスに相互運用します。

tf.Tensor と ND 配列

ND 配列は tf.Tensor の薄いラッパーです。これらのタイプは実際のデータコピーを引き起こすことなく安価に他のものに変換できます。

x = tf.constant([1, 2])

# Convert `tf.Tensor` to `ndarray`.
tnp_x = tnp.asarray(x)
print(tnp_x)

# Convert `ndarray` to `tf.Tensor` can be done in following ways.
print(tnp_x.data)
print(tf.convert_to_tensor(tnp_x))

# Note that tf.Tensor.numpy() will continue to return `np.ndarray`.
print(x.numpy(), x.numpy().__class__)

ndarray<tf.Tensor([1 2], shape=(2,), dtype=int32)>
tf.Tensor([1 2], shape=(2,), dtype=int32)
tf.Tensor([1 2], shape=(2,), dtype=int32)
[1 2] <class 'numpy.ndarray'>

TensorFlow 相互運用性

ND 配列は TensorFlow API に渡すことができます。これらの呼び出しは内部的には ND 配列入力は tf.Tensor に変換されます。前に言及したように、そのような変換は実際にはデータコピーを行ないません、アクセラレータや遠隔デバイス上に配置されたデータに対してさえも。

反対に、tf.Tensor オブジェクトは tf.experimental.numpy API に渡すことができます。これらの入力は内部的にはデータコピーを遂行することなく ND 配列に変換されます。

# ND array passed into TensorFlow function.
# This returns a `tf.Tensor`.
tf_sum = tf.reduce_sum(tnp.ones([2, 3], tnp.float32))
print("Output = %s" % tf_sum)

# `tf.Tensor` passed into TensorFlow NumPy function.
# This returns an ND array.
tnp_sum = tnp.sum(tf.ones([2, 3]))
print("Output = %s" % tnp_sum)

Output = tf.Tensor(6.0, shape=(), dtype=float32)
Output = ndarray<tf.Tensor(6.0, shape=(), dtype=float32)>

演算子優先度

演算子を使用して ND 配列と tf.Tensor オブジェクトが結合されるとき、どのオブジェクトが演算子を実行するかを決定するために優先度規則が使用されます。これはこれらのクラスにより定義される __array_priority__ 値により制御されます。

tf.Tensor は ND 配列のそれよりも __array_priority__ を高く定義します。これは ND 配列入力は tf.Tensor に変換されて演算子の tf.Tensor バージョンが呼び出されることを意味します。

下のコードはそれが出力タイプにどのように影響するかを実演します。

x = tnp.ones([2, 2]) + tf.ones([2, 1])
print("x = %s\nClass = %s" % (x, x.__class__))

x = tf.Tensor(
[[2. 2.]
 [2. 2.]], shape=(2, 2), dtype=float32)
Class = <class 'tensorflow.python.framework.ops.EagerTensor'>

勾配とヤコビアン: tf.GradientTape

TensorFlow の GradientTape は TensorFlow と TensorFlow NumPy コードを通してバックプロパゲーションのために利用できます。GradientTape API はまた ND 配列出力も返すことができます。

サンプル・モデルセクションで作成されたモデルを使用し、そして勾配とヤコビアンを計算します。

def create_batch(batch_size=32):
  """Creates a batch of input and labels."""
  return (tnp.random.randn(batch_size, 32).astype(tnp.float32),
          tnp.random.randn(batch_size, 2).astype(tnp.float32))

def compute_gradients(model, inputs, labels):
  """Computes gradients of squared loss between model prediction and labels."""
  with tf.GradientTape() as tape:
    assert model.weights is not None
    # Note that `model.weights` need to be explicitly watched since they
    # are not tf.Variables.
    tape.watch(model.weights)
    # Compute prediction and loss
    prediction = model.predict(inputs)
    loss = tnp.sum(tnp.square(prediction - labels))
  # This call computes the gradient through the computation above.
  return tape.gradient(loss, model.weights)

inputs, labels = create_batch()
gradients = compute_gradients(model, inputs, labels)

# Inspect the shapes of returned gradients to verify they match the
# parameter shapes.
print("Parameter shapes:", [w.shape for w in model.weights])
print("Gradient shapes:", [g.shape for g in gradients])
# Verify that gradients are of type ND array.
assert isinstance(gradients[0], tnp.ndarray)

Parameter shapes: [(32, 64), (64,), (64, 2)]
Gradient shapes: [(32, 64), (64,), (64, 2)]

# Computes a batch of jacobians. Each row is the jacobian of an element in the
# batch of outputs w.r.t the corresponding input batch element.
def prediction_batch_jacobian(inputs):
  with tf.GradientTape() as tape:
    tape.watch(inputs)
    prediction = model.predict(inputs)
  return prediction, tape.batch_jacobian(prediction, inputs)

inp_batch = tnp.ones([16, 32], tnp.float32)
output, batch_jacobian = prediction_batch_jacobian(inp_batch)
# Note how the batch jacobian shape relates to the input and output shapes.
print("Output shape: %s, input shape: %s" % (output.shape, inp_batch.shape))
print("Batch jacobian shape:", batch_jacobian.shape)

Output shape: (16, 2), input shape: (16, 32)
Batch jacobian shape: (16, 2, 32)

トレース・コンパイル: tf.function

TensorFlow の tf.function は、遥かに高速なパフォーマンスのためにコードの「コンパイルをトレース」してからこれらのトレースを最適化することにより動作します。グラフと関数へのイントロダクションを見てください。

tf.function は TensorFlow NumPy コードを最適化するためにもまた利用できます。ここにスピードアップを実演するための単純なサンプルがあります。tf.function コードの本体は TensorFlow NumPy API への呼び出しを含む、入力と出力は ND 配列であることに注意してください。

inputs, labels = create_batch(512)
print("Eager performance")
compute_gradients(model, inputs, labels)
print(timeit.timeit(lambda: compute_gradients(model, inputs, labels),
                    number=10)* 100, "ms")

print("\ntf.function compiled performance")
compiled_compute_gradients = tf.function(compute_gradients)
compiled_compute_gradients(model, inputs, labels)  # warmup
print(timeit.timeit(lambda: compiled_compute_gradients(model, inputs, labels),
                    number=10) * 100, "ms")

(訳注: 原文)

Eager performance
1.7211115999998583 ms

tf.function compiled performance
0.8105368999849816 ms

(訳注: 訳者による試行)

Eager performance
1.4130064000028142 ms

tf.function compiled performance
0.7711892000088483 ms

ベクトル化: tf.vectorized_map

TensorFlow は並列ループをベクトル化するための組込みサポートを持ちます、これは 1 から 2 桁のスピードアップを可能にします。これらのスピードアップは tf.vectorized_map API を通してアクセス可能で TensorFlow NumPy コードにもまた適用されます。

バッチの各出力の入力バッチ要素に関する勾配を計算することは時に有用です。そのような計算は下で示されるように tf.vectorized_map を使用して効率的に成されます。

@tf.function
def vectorized_per_example_gradients(inputs, labels):
  def single_example_gradient(arg):
    inp, label = arg
    return compute_gradients(model,
                             tnp.expand_dims(inp, 0),
                             tnp.expand_dims(label, 0))
  # Note that a call to `tf.vectorized_map` semantically maps
  # `single_example_gradient` over each row of `inputs` and `labels`.
  # The interface is similar to `tf.map_fn`.
  # The underlying machinery vectorizes away this map loop which gives
  # nice speedups.
  return tf.vectorized_map(single_example_gradient, (inputs, labels))

batch_size = 128
inputs, labels = create_batch(batch_size)

per_example_gradients = vectorized_per_example_gradients(inputs, labels)
for w, p in zip(model.weights, per_example_gradients):
  print("Weight shape: %s, batch size: %s, per example gradient shape: %s " % (
      w.shape, batch_size, p.shape))

Weight shape: (32, 64), batch size: 128, per example gradient shape: (128, 32, 64) 
Weight shape: (64,), batch size: 128, per example gradient shape: (128, 64) 
Weight shape: (64, 2), batch size: 128, per example gradient shape: (128, 64, 2)

# Benchmark the vectorized computation above and compare with
# unvectorized sequential computation using `tf.map_fn`.
@tf.function
def unvectorized_per_example_gradients(inputs, labels):
  def single_example_gradient(arg):
    inp, label = arg
    output = compute_gradients(model,
                               tnp.expand_dims(inp, 0),
                               tnp.expand_dims(label, 0))
    return output

  return tf.map_fn(single_example_gradient, (inputs, labels),
                   fn_output_signature=(tf.float32, tf.float32, tf.float32))

print("Running vectorized computaton")
print(timeit.timeit(lambda: vectorized_per_example_gradients(inputs, labels),
                    number=10) * 100, "ms")

print("\nRunning unvectorized computation")
per_example_gradients = unvectorized_per_example_gradients(inputs, labels)
print(timeit.timeit(lambda: unvectorized_per_example_gradients(inputs, labels),
                    number=5) * 200, "ms")

(訳注: 原文)

Running vectorized computaton
0.8167428999968251 ms

Running unvectorized computation
10.86823519999598 ms

(訳注: 訳者による試行)

Running vectorized computaton
1.645320699981312 ms

Running unvectorized computation
7.68111739998858 ms

デバイス配置

TensorFlow NumPy は CPU, GPU, TPU そして遠隔デバイス上で演算を配置できます。それはデバイス配置のために標準的な TensorFlow メカニズムを使用します。下では単純なサンプルがどのように総てのデイバスをリストしてから幾つかの計算を特定のデバイスに置くかを示します。

TensorFlow はまたデバイスに渡る計算を複製して collective reduction を遂行するための API も持ちますが、これはここではカバーされません。

デバイスをリストアップする

tf.config.list_logical_devices と tf.config.list_physical_devices はどのデバイスが利用できるかを見つけるために使用できます。

print("All logical devices:", tf.config.list_logical_devices())
print("All physical devices:", tf.config.list_physical_devices())

# Try to get the GPU device. If unavailable, fallback to CPU.
try:
  device = tf.config.list_logical_devices(device_type="GPU")[0]
except IndexError:
  device = "/device:CPU:0"

All logical devices: [LogicalDevice(name='/device:CPU:0', device_type='CPU'), LogicalDevice(name='/device:GPU:0', device_type='GPU')]
All physical devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

演算を配置する: tf.device

演算はデバイス上に tf.device scope 内でそれを呼び出すことで配置できます。

print("Using device: %s" % str(device))
# Run operations in the `tf.device` scope.
# If a GPU is available, these operations execute on the GPU and outputs are
# placed on the GPU memory.
with tf.device(device):
  prediction = model.predict(create_batch(5)[0])

print("prediction is placed on %s" % prediction.data.device)

Using device: LogicalDevice(name='/device:GPU:0', device_type='GPU')
prediction is placed on /job:localhost/replica:0/task:0/device:GPU:0

ND 配列をデバイスに渡りコピーする: tnp.copy

特定のデバイススコープ内に置かれた、tnp.copy への呼び出しはデータをそのデバイスにコピーします、データが既にそのデバイス上にない場合には。

with tf.device("/device:CPU:0"):
  prediction_cpu = tnp.copy(prediction)
print(prediction.data.device)
print(prediction_cpu.data.device)

/job:localhost/replica:0/task:0/device:GPU:0
/job:localhost/replica:0/task:0/device:CPU:0

パフォーマンス比較

TensorFlow NumPy は高度に最適化された TensorFlow カーネルを使用します、これは CPU, GPU と TPU 上でディスパッチできます。TensorFlow はまた演算融合 (= fusion) のような、多くのコンパイラ最適化も遂行します、これはパフォーマンスとメモリ (使用量) 改良のために翻訳します。更に学習するためには TensorFlow graph optimization with Grappler を見てください。

けれども TensorFlow は NumPy に比べて演算をディスパッチするためにより高いオーバヘッドを持ちます。(約 10 マイクロ秒より少ない) 小さい演算から成るワークロードのために、これらのオーバーヘッドは実行時間を専有する可能性がありそして NumPy はより良いパフォーマンスを提供できるかもしれません。他の場合については、TensorFlow は一般的にはより良いパフォーマンスを提供するはずです。

下では、異なる入力サイズについて NumPy と TensorFlow NumPy パフォーマンスを比較するためにベンチマークを実行します。

def benchmark(f, inputs, number=30, force_gpu_sync=False):
  """Utility to benchmark `f` on each value in `inputs`."""
  times = []
  for inp in inputs:
    def _g():
      if force_gpu_sync:
        one = tnp.asarray(1)
      f(inp)
      if force_gpu_sync:
        with tf.device("CPU:0"):
          tnp.copy(one)  # Force a sync for GPU case

    _g()  # warmup
    t = timeit.timeit(_g, number=number)
    times.append(t * 1000. / number)
  return times


def plot(np_times, tnp_times, compiled_tnp_times, has_gpu, tnp_times_gpu):
  """Plot the different runtimes."""
  plt.xlabel("size")
  plt.ylabel("time (ms)")
  plt.title("Sigmoid benchmark: TF NumPy vs NumPy")
  plt.plot(sizes, np_times, label="NumPy")
  plt.plot(sizes, tnp_times, label="TF NumPy (CPU)")
  plt.plot(sizes, compiled_tnp_times, label="Compiled TF NumPy (CPU)")
  if has_gpu:
    plt.plot(sizes, tnp_times_gpu, label="TF NumPy (GPU)")
  plt.legend()

# Define a simple implementation of `sigmoid`, and benchmark it using
# NumPy and TensorFlow NumPy for different input sizes.

def np_sigmoid(y):
  return 1. / (1. + np.exp(-y))

def tnp_sigmoid(y):
  return 1. / (1. + tnp.exp(-y))

@tf.function
def compiled_tnp_sigmoid(y):
  return tnp_sigmoid(y)

sizes = (2**0, 2 ** 5, 2 ** 10, 2 ** 15, 2 ** 20)
np_inputs = [np.random.randn(size).astype(np.float32) for size in sizes]
np_times = benchmark(np_sigmoid, np_inputs)

with tf.device("/device:CPU:0"):
  tnp_inputs = [tnp.random.randn(size).astype(np.float32) for size in sizes]
  tnp_times = benchmark(tnp_sigmoid, tnp_inputs)
  compiled_tnp_times = benchmark(compiled_tnp_sigmoid, tnp_inputs)

has_gpu = len(tf.config.list_logical_devices("GPU"))
if has_gpu:
  with tf.device("/device:GPU:0"):
    tnp_inputs = [tnp.random.randn(size).astype(np.float32) for size in sizes]
    tnp_times_gpu = benchmark(compiled_tnp_sigmoid, tnp_inputs, 100, True)
else:
  tnp_times_gpu = None
plot(np_times, tnp_times, compiled_tnp_times, has_gpu, tnp_times_gpu)

(訳注: 原文)

(訳注: 訳者による試行)

以上

2021年1月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31