TensorFlow : Guide : Accelerators : GPU を利用する (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
更新日時 : 07/14/2018
作成日時 : 04/21/2018

* TensorFlow 1.9 でドキュメント構成が変わりましたので調整しました。
* GPU の利用方法については、以前は Tutorials に含まれていましたが、内容が改定されて Programmer’s Guide に移されましたので再翻訳致しました。
* 本ページは、TensorFlow 本家サイトの Guide – Accelerators – Using GPUs を翻訳した上で
適宜、補足説明したものです：

https://www.tensorflow.org/guide/using_gpu

* サンプルコードの動作確認はしておりますが、適宜、追加改変している場合もあります。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

★ 無料セミナー開催中 ★ クラスキャット主催人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社公式 Web サイトでご確認頂けます。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/

Facebook: https://www.facebook.com/ClassCatJP/

サポートされるデバイス

典型的なシステムでは、複数の計算デバイスがあります。TensorFlow では、サポートされるデバイスのタイプは CPU と GPU です。これらは文字列として表されます。例えば :

“/cpu:0”: 貴方のマシンの CPU。
“/gpu:0”: 貴方のマシンの GPU、もし一つあれば。
“/gpu:1”: 貴方のマシンの２つ目の GPU、etc.

もし TensorFlow 演算が CPU と GPU 両方の実装を持つならば、演算がデバイスに割り当てられる時 GPU デバイスに優先順位が与えられます。例えば、matmul は CPU と GPU kernel を持ちます。cpu:0 と gpu:0 を持つシステム上、matmul を実行するために gpu:0 が選択されます。

デバイス割り当てをロギングする

どのデバイスが演算とテンソルに割り当てられたかを見つけ出すためには、セッションを log_device_placement 構成オプションを True にして作成します。

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

次のような出力が見れるはずです :

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

手動のデバイス割り当て

もし貴方が特定の演算を自動的に選択されたものの代わりに貴方の選択したデバイス上で実行させたいのであれば、コンテキスト内で全ての演算が同じデバイス割り当てを持つようなデバイスコンテキストを作成するために tf.device が使用できます。

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

今 a と b が cpu:0 に割り当てられたことが見れるでしょう。MatMul 演算についてはデバイスが明示的に指定されていませんので、TensorFlow ランタイムはその演算と利用可能なデバイス (この例では gpu:0) をベースに一つを選択して必要であればデバイス間で自動的に tensor をコピーするでしょう。

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

GPU メモリの増大を可能にする

デフォルトでは、(CUDA_VISIBLE_DEVICES 下の) 総ての可視の GPU の GPU メモリの殆ど総てをプロセスにマップします。これはメモリ断片化を削減してデバイスの比較的貴重な GPU メモリ・リソースをより効率的に使用するために遂行されます。

ある場合にはプロセスにとって利用可能なメモリのサブセットを割り当てるだけ、またはプロセスにより必要とされるに従いメモリ消費量を増大するだけが望ましいです。TensorFlow はこれを制御するために Session 上の２つの Config オプションを提供します。

最初は allow_growth オプションで、これはランタイム割り当てをベースにした GPU メモリだけを割り当てることを試みます : それは非常に少ないメモリを割り当てることから初めて、Session が実行されてより多くの GPU メモリが必要になるとき、TensorFlow プロセスにより必要とされる GPU メモリ領域を拡張します。メモリを解放しないことに注意してください、何故ならばそれは更に悪いメモリ断片化にさえ繋がるからです。このオプションを有効にするには、次によって ConfigProto のオプションを設定します :

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

２番目の方法は per_process_gpu_memory_fraction で、これは各可視の GPU が割り当てられるべきメモリの全体的な総量の分数を決定します。例えば、次によって TensorFlow に合計メモリの 40 % だけを割り当てることを伝えることができます :

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

TensorFlow プロセスで利用可能な GPU メモリの総量を真に縛ることを望む場合にこれは有用です。

マルチ-GPU システムで単一の GPU を使う

システムに１つ以上の GPU を持つならば、最小の ID を持つ GPU がデフォルトで選択されるでしょう。異なる GPU 上で実行したいのであれば、明示的に好みの選択を指定する必要があります :

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

もし貴方が指定したデバイスが存在しないのであれば、InvalidArgumentError を得るでしょう :

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/device:GPU:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor, _device="/device:GPU:2"]()]]

もし貴方が TensorFlow に指定した一つが存在しない場合に演算を実行させるために存在しサポートされているデバイスを自動的に選択させたいのであれば、セッションを作成時の構成オプションにおいて allow_soft_placement を True に設定することができます。

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))

複数の GPU を使用する

もし貴方が TensorFlow を複数の GPU 上で実行したいのであれば、各タワーが異なる GPU に割り当てられる、マルチ・タワー流儀でモデルを構築することができます。例えば :

# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

貴方は次のような出力を見るでしょう。

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]

cifar 10 チュートリアルは複数の GPU でどのように訓練するかを示す良い例です。

以上

月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30