Keras Stable Diffusion : 混合精度のパフォーマンス (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 12/28/2022

* 本ページは、github の divamgupta/stable-diffusion-tensorflow レポジトリの以下のドキュメント内の Colab ノートブックを翻訳した上でまとめ直したものです。一部は修正しています：

divamgupta/stable-diffusion-tensorflow/README.md

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

クラスキャット人工知能研究開発支援サービス

◆ クラスキャットは人工知能・テレワークに関する各種サービスを提供しています。お気軽にご相談ください :

人工知能研究開発支援
1. 人工知能研修サービス(経営者層向けオンサイト研修)
2. テクニカルコンサルティングサービス
3. 実証実験(プロトタイプ構築)
4. アプリケーションへの実装
人工知能研修サービス
PoC(概念実証)を失敗させないための支援

◆ 人工知能とビジネスをテーマに WEB セミナーを定期的に開催しています。スケジュール。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション
sales-info@classcat.com ; Web: www.classcat.com ; ClassCatJP

Keras Stable Diffusion : 混合精度のパフォーマンス

このノートブックでは Keras Stable Diffusion を混合精度で実行し、デフォルトの場合と画像生成時間を比較してみます。最初にデフォルト設定で実行時間を計測します。次に、混合精度の設定をしてから同一の画像生成を行ない、実行時間を比較します。

デフォルト環境の利用

最初はデフォルトのままで利用します。GPU は Tesla T4 です。

GPU 要件のインストール

!nvidia-smi

Tue Dec 27 16:56:19 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P0    27W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

!pip install git+https://github.com/divamgupta/stable-diffusion-tensorflow --upgrade --quiet
!pip install tensorflow tensorflow_addons ftfy --upgrade --quiet

Text2Image generator をインスタンス化して最初の画像を作成しましょう

まずは StableDiffusion インスタンスを作成します :

from stable_diffusion_tf.stable_diffusion import StableDiffusion
#from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image

generator = StableDiffusion(
#generator = Text2Image( 
    img_height=512,
    img_width=512,
    jit_compile=False,  # You can try True as well (different performance profile)
)

最初の画像生成は少しの追加のコンパイル・オーバーヘッドを持ちます。

img = generator.generate(
    "DSLR photograph of an astronaut riding a horse",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

0   1: 100%|██████████| 50/50 [01:09<00:00,  1.38s/it]

CPU times: user 59.5 s, sys: 17.5 s, total: 1min 16s
Wall time: 1min 22s

2 回目以後は高速化されます :

%%time

img = generator.generate(
    "DSLR photograph of an astronaut riding a horse",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

  0   1: 100%|██████████| 50/50 [00:55<00:00,  1.11s/it]

CPU times: user 37.4 s, sys: 14.7 s, total: 52.1 s
Wall time: 56.4 s

バッチ化生成を試しましょう

%%time

img = generator.generate(
    "An epic unicorn riding in the sunset, artstation concept art",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=4,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

  0   1: 100%|██████████| 50/50 [02:46<00:00,  3.33s/it]

CPU times: user 59.5 s, sys: 5.75 s, total: 1min 5s
Wall time: 3min 10s

混合精度の利用

次に Colab 環境をリセットした上で、同じ条件で混合精度を利用してみます。GPU は同じく Tesla T4 です。

GPU 要件のインストール

!nvidia-smi

Tue Dec 27 16:56:19 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P0    27W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

!pip install git+https://github.com/divamgupta/stable-diffusion-tensorflow --upgrade --quiet
!pip install tensorflow tensorflow_addons ftfy --upgrade --quiet

混合精度の設定

混合精度の設定は簡単です :

from tensorflow import keras
keras.mixed_precision.set_global_policy("mixed_float16")

Text2Image generator をインスタンス化して最初の画像を作成しましょう

まずは StableDiffusion インスタンスを作成します :

from stable_diffusion_tf.stable_diffusion import StableDiffusion
#from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image

generator = StableDiffusion(
#generator = Text2Image( 
    img_height=512,
    img_width=512,
    jit_compile=False,  # You can try True as well (different performance profile)
)

最初の実行は少しの追加のコンパイル・オーバーヘッドを持ちます。

img = generator.generate(
    "DSLR photograph of an astronaut riding a horse",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

  0   1: 100%|██████████| 50/50 [00:52<00:00,  1.05s/it]

CPU times: user 44.6 s, sys: 9.58 s, total: 54.1 s
Wall time: 1min 1s

2 回目以後は高速化されます :

%%time

img = generator.generate(
    "DSLR photograph of an astronaut riding a horse",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

  0   1: 100%|██████████| 50/50 [00:36<00:00,  1.36it/s]

CPU times: user 25 s, sys: 8.04 s, total: 33 s
Wall time: 37.5 s

バッチ化生成を試しましょう

%%time

img = generator.generate(
    "An epic unicorn riding in the sunset, artstation concept art",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=4,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

  0   1: 100%|██████████| 50/50 [01:43<00:00,  2.06s/it]

CPU times: user 51.4 s, sys: 4.09 s, total: 55.5 s
Wall time: 1min 57s

まとめ

デフォルト環境

1 回目 : 1min 22s
2 回目 : 56.4 s
バッチ処理 (4 枚) : 3min 10s (47.5 s/image)

混合精度を利用した場合

1 回目 : 1min 1s
2 回目 : 37.5 s
バッチ処理 (4 枚) : 1min 57s (29.25 s/image)

以上

2022年12月
月	火	水	木	金	土	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31