TensorFlow : Tutorials : GPU を利用する

TensorFlow : Tutorials : GPU を利用する（翻訳/解説）
翻訳 : (株)クラスキャットセールスインフォメーション
更新日時 : 09/15/2017
作成日時 : 02/12/2016

* 本ページは、TensorFlow 本家サイトの Tutorials – Using GPUs を翻訳した上で適宜、補足説明したものです：

https://www.tensorflow.org/tutorials/using_gpu

* (obsolete, リンク切れ) 本ページは、TensorFlow の本家サイトの How To – Using GPUs を翻訳した上で
適宜、補足説明したものです：
https://www.tensorflow.org/versions/master/how_tos/using_gpu/index.html#using-gpus
* サンプルコードの動作確認はしておりますが、適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

サポートされるデバイス

典型的なシステムでは、複数の計算デバイスがあります。TensorFlow では、サポートされるデバイスのタイプは CPU と GPU です。これらは文字列として表されます。例えば :

“/cpu:0”: 貴方のマシンの CPU。
“/gpu:0”: 貴方のマシンの GPU、もし一つあれば。
“/gpu:1”: 貴方のマシンの２つ目の GPU、etc.

もし TensorFlow 演算が CPU と GPU 両方の実装を持つならば、演算がデバイスに割り当てられる時 GPU デバイスに優先順位が与えられます。例えば、matmul は CPU と GPU kernel を持ちます。cpu:0 と gpu:0 を持つシステム上、matmul を実行するために gpu:0 が選択されます。

デバイス割り当て (placement) をロギングする

どのデバイスが演算とテンソルに割り当てられたかを見つけ出すためには、セッションを log_device_placement 構成オプションを True にして作成します。

# グラフを作成します。
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# log_device_placement を True にしてセッションを作成します。
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# OP を実行します。
print sess.run(c)

次のような出力が見れるはずです :

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

手動のデバイス割り当て

もし貴方が特定の演算を自動的に選択されたものの代わりに貴方の選択したデバイス上で実行させたいのであれば、コンテキスト内で全ての演算が同じデバイス割り当てを持つようなデバイスコンテキストを作成するために tf.device が使用できます。

# グラフを作成します。
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# log_device_placement を True にしてセッションを作成します。
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# OP を実行します。
print sess.run(c)

今 a と b が cpu:0 に割り当てられたことが見れるでしょう。

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]

マルチ-GPU システムで単一の GPU を使う

システムに１つ以上の GPU を持つならば、最小の ID を持つ GPU がデフォルトで選択されるでしょう。異なる GPU 上で実行したいのであれば、明示的に好みの選択を指定する必要があります :

# グラフを作成します。
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# log_device_placement を True にしてセッションを作成します。
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# OP を実行します。
print sess.run(c)

もし貴方が指定したデバイスが存在しないのであれば、InvalidArgumentError を得るでしょう :

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/gpu:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/gpu:2"]()]]

もし貴方が TensorFlow に指定した一つが存在しない場合に演算を実行させるために存在しサポートされているデバイスを自動的に選択させたいのであれば、セッションを作成時の構成オプションにおいて allow_soft_placement を True に設定することができます。

# グラフを作成します。
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# allow_soft_placement と log_device_placement を True にしてセッションを作成します。
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# OP を実行します。
print sess.run(c)

複数の GPU を使用する

もし貴方が TensorFlow を複数の GPU 上で実行したいのであれば、各タワーは異なる GPU に割り当てられる、マルチ・タワー流儀 (fashion)でモデルを構築することができます。例えば :

# グラフを作成します。
c = []
for d in ['/gpu:2', '/gpu:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# log_device_placement を True にしてセッションを作成します。
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# OP を実行します。
print sess.run(sum)

貴方は次のような出力を見るでしょう。

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/gpu:3
Const_2: /job:localhost/replica:0/task:0/gpu:3
MatMul_1: /job:localhost/replica:0/task:0/gpu:3
Const_1: /job:localhost/replica:0/task:0/gpu:2
Const: /job:localhost/replica:0/task:0/gpu:2
MatMul: /job:localhost/replica:0/task:0/gpu:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

cifar 10 tutorial は複数の GPU でどのように訓練するかをデモする良い例です。

以上

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29