TensorFlow 2.0 Alpha : 上級 Tutorials : カスタマイズ :- テンソルと演算 (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 04/05/2019

* 本ページは、TensorFlow の本家サイトの TF 2.0 Alpha – Advanced Tutorials – Customization の以下のページを翻訳した上で適宜、補足説明したものです：

Tensors and Operations

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

カスタマイズ :- テンソルと演算

これは初歩的な TensorFlow チュートリアルでどのように以下を行なうかを示します :

必要なパッケージをインポートする
tensor を作成して使用する
GPU アクセラレーションを使用する
tf.data.Dataset を実演する

from __future__ import absolute_import, division, print_function

!pip install -q tensorflow==2.0.0-alpha0

Import TensorFlow

始まるために、tensorflow モジュールをインポートします。TensorFlow 2.0 の時点では、eager execution がデフォルトで有効です。これは TensorFlow へのより対話的なフロントエンドを可能にします、その詳細は後で議論します。

import tensorflow as tf

Tensors

Tensor は多次元配列です。NumPy ndarray オブジェクトと同様に、tf.Tensor オブジェクトはデータ型と shape を持ちます。更に、tf.Tensor は (GPU のような) アクセラレータ・メモリに常駐できます。TensorFlow は tf.Tensor を消費して生成する演算 (tf.add, tf.matmul, tf.linalg.inv etc.) の豊富なライブラリを提供します。これらの演算は native Python 型を自動的に変換します、例えば :

print(tf.add(1, 2))
print(tf.add([1, 2], [3, 4]))
print(tf.square(5))
print(tf.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(tf.square(2) + tf.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)

各 tf.Tensor は shape とデータ型を持ちます :

x = tf.matmul([[1]], [[2, 3]])
print(x.shape)
print(x.dtype)

(1, 2)
<dtype: 'int32'>

Numpy 配列と tf.Tensor 間の最も明白な違いは :

Tensor は (GPU, TPU のような) アクセラレータメモリにより支援されます。
Tensor は immutable (不変) です。

NumPy 互換性

TensorFlow tf.Tensors と NumPy ndarray 間の変換は容易です :

TensorFlow 演算は NumPy ndarray を Tensor に自動的に変換します。
NumPy 演算は Tensor を Numpy ndarray に自動的に変換します。

Tensor は .numpy() メソッドを使用して明示的に NumPy ndarray に変換されます。これらの変換は典型的には安価です、何故ならば配列と tf.Tensor は基礎となるメモリ表現を可能なら、共有するからです。けれども、基礎的な表現の共有は常に可能ではありません、何故ならば tf.Tensor は GPU メモリにホストされるかもしれない一方で NumPy 配列は常にホストメモリにより支援されるからで、変換は GPU からホストメモリへのコピーを伴います。

import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)


print("And NumPy operations convert Tensors to numpy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)
And NumPy operations convert Tensors to numpy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]
The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]

GPU アクセラレーション

多くの TensorFlow 演算は計算のために GPU を使用して高速化されます。どのようなアノテーションがなくても、演算のために GPU か CPU を使用するかを TensorFlow は自動的に決定します – 必要であれば CPU と GPU メモリの間で tensor をコピーします。演算で生成された tensor は典型的には (その上で) 演算が実行されるデバイスのメモリで支援されます、例えば :

x = tf.random.uniform([3, 3])

print("Is there a GPU available: "),
print(tf.test.is_gpu_available())

print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

Is there a GPU available: 
False
Is the Tensor on GPU #0:  
False

デバイス名

Tensor.device プロパティは tensor の内容をホストするデバイスの完全修飾 (= fully qualified) 文字列名を提供します。この名前は (その上でプログラムが実行されている) ホストのネットワークアドレスやそのホスト内のデバイスの識別子のような、多くの詳細をエンコードします。これは TensorFlow プログラムの分散実行のために必要です。tensor がホストの N-th GPU 上に置かれるのであれば文字列は GPU:<N> で終わります

明示的なデバイス配置

TensorFlow では、配置は個々の演算が実行のためのデバイスにどのように割り当てられるか (置かれるか) を参照します。言及したように、提供される明示的なガイダンスがないときは、TensorFlow はどのデバイスで演算を実行するかを自動的に決定して必要であれば、tensor をそのデバイスにコピーします。けれども、TensorFlow 演算は tf.device コンテキスト・マネージャを使用して特定のデバイス上に明示的に置くことができます、例えば :

import time

def time_matmul(x):
  start = time.time()
  for loop in range(10):
    tf.matmul(x, x)

  result = time.time()-start
    
  print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
  x = tf.random.uniform([1000, 1000])
  assert x.device.endswith("CPU:0")
  time_matmul(x)

# Force execution on GPU #0 if available
if tf.test.is_gpu_available():
  with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("GPU:0")
    time_matmul(x)

On CPU:
10 loops: 91.15ms

データセット

このセクションはデータを貴方のモデルに供給するためにパイプラインを構築するために tf.data.Dataset API を使用します。tf.data.Dataset API はモデルの訓練や評価ループに供給する、単純で再利用可能なピースから高パフォーマンスで複雑なパイプラインを構築するために使用されます。

ソース Dataset を作成する

Dataset.from_tensors, Dataset.from_tensor_slices のようなファクトリ関数を使用するか、TextLineDataset や TFRecordDataset のようなファイルから読むオブジェクトを使用してソース・データセットを作成します。より多くの情報については TensorFlow Dataset ガイドを見てください。

ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()

with open(filename, 'w') as f:
  f.write("""Line 1
Line 2
Line 3
  """)

ds_file = tf.data.TextLineDataset(filename)

transformation を適用する

データセット・レコードに変換を適用するためには map, batch と shuffle のような transformation 関数を使用します。

ds_tensors = ds_tensors.map(tf.square).shuffle(2).batch(2)

ds_file = ds_file.batch(2)

Iterate

tf.data.Dataset オブジェクトはレコードに渡りループするために iteration をサポートします。

print('Elements of ds_tensors:')
for x in ds_tensors:
  print(x)

print('\nElements in ds_file:')
for x in ds_file:
  print(x)

Elements of ds_tensors:
tf.Tensor([1 4], shape=(2,), dtype=int32)
tf.Tensor([ 9 25], shape=(2,), dtype=int32)
tf.Tensor([36 16], shape=(2,), dtype=int32)

Elements in ds_file:
tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'Line 3' b'  '], shape=(2,), dtype=string)

以上

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30