TensorFlow Quantum 0.2.0 Tutorials : MNIST 分類 (翻訳/解説)
翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 03/14/2020 (0.2.0)

* 本ページは、TensorFlow Quantum の以下のページを翻訳した上で適宜、補足説明したものです：

Tutorials : MNIST classification

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

Tutorials : MNIST 分類

このチュートリアルは MNIST の単純化されたバージョンを分類するために量子ニューラルネットワーク (QNN) を構築します、Farhi et al で使用されたアプローチに類似しています。この古典的データ問題上の量子ニューラルネットワークのパフォーマンスは古典的ニューラルネットワークと比較されます。

セットアップ

TensorFlow Quantum をインストールします :

pip install -q tensorflow-quantum

今は TensorFlow とモジュール依存性をインポートします :

import tensorflow as tf
import tensorflow_quantum as tfq

import cirq
import sympy
import numpy as np
import seaborn as sns
import collections

# visualization tools
%matplotlib inline
import matplotlib.pyplot as plt
from cirq.contrib.svg import SVGCircuit

1. データをロードする

このチュートリアルでは Farhi et al に従って、数字 3 と 6 の間を識別する二値分類器を構築します。このセクションは次のデータ処理をカバーします :

Keras から raw データをロードします。
データセットを 3 と 6 だけにフィルターします。
画像をそれらが量子コンピュータにフィットできるようにダウンスケールします。
任意の矛盾する (= contradictory) サンプルを除去します。
二値画像を Cirq 回路に変換します。
Cirq 回路を TensorFlow 量子回路に変換します。

1.1 raw データをロードする

Keras で配布されている MNIST データセットをロードします。

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Rescale the images from [0,255] to the [0.0,1.0] range.
x_train, x_test = x_train[..., np.newaxis]/255.0, x_test[..., np.newaxis]/255.0

print("Number of original training examples:", len(x_train))
print("Number of original test examples:", len(x_train))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
Number of original training examples: 60000
Number of original test examples: 60000

3 と 6 だけを保持するようにデータセットをフィルターし、他のクラスは除去します。同時に、ラベル y をブーリアンに変換します : 3 のために True そして 6 のために False です。

def filter_36(x, y):
    keep = (y == 3) | (y == 6)
    x, y = x[keep], y[keep]
    y = y == 3
    return x,y

x_train, y_train = filter_36(x_train, y_train)
x_test, y_test = filter_36(x_test, y_test)

print("Number of filtered training examples:", len(x_train))
print("Number of filtered test examples:", len(x_test))

Number of filtered training examples: 12049
Number of filtered test examples: 1968

最初のサンプルを表示します :

print(y_train[0])

plt.imshow(x_train[0, :, :, 0])
plt.colorbar()

True

<matplotlib.colorbar.Colorbar at 0x7eff63fa4e48>

1.2 画像をダウンスケールする

28×28 の画像サイズは現在の量子コンピュータのためには遥かに大きすぎます。画像を 4×4 に下げてリサイズします :

x_train_small = tf.image.resize(x_train, (4,4)).numpy()
x_test_small = tf.image.resize(x_test, (4,4)).numpy()

再度、最初の訓練サンプルを表示します — リサイズ後にです :

print(y_train[0])

plt.imshow(x_train_small[0,:,:,0], vmin=0, vmax=1)
plt.colorbar()

True

<matplotlib.colorbar.Colorbar at 0x7eff6006bd68>

1.3 矛盾するサンプルを除去する

Farhi et al. のセクション 3.3 Learning to Distinguish Digits から両者のクラスに属するようにラベル付けされている画像を除去するためにデータセットをフィルターします。

これは標準的な機械学習手続きではありませんが、ペーパーをフォローする利益のために含まれます。

def remove_contradicting(xs, ys):
    mapping = collections.defaultdict(set)
    # Determine the set of labels for each unique image:
    for x,y in zip(xs,ys):
       mapping[tuple(x.flatten())].add(y)
    
    new_x = []
    new_y = []
    for x,y in zip(xs, ys):
      labels = mapping[tuple(x.flatten())]
      if len(labels) == 1:
          new_x.append(x)
          new_y.append(list(labels)[0])
      else:
          # Throw out images that match more than one label.
          pass
    
    num_3 = sum(1 for value in mapping.values() if True in value)
    num_6 = sum(1 for value in mapping.values() if False in value)
    num_both = sum(1 for value in mapping.values() if len(value) == 2)

    print("Number of unique images:", len(mapping.values()))
    print("Number of 3s: ", num_3)
    print("Number of 6s: ", num_6)
    print("Number of contradictory images: ", num_both)
    print()
    print("Initial number of examples: ", len(xs))
    print("Remaining non-contradictory examples: ", len(new_x))
    
    return np.array(new_x), np.array(new_y)

結果としてのカウントは報告されている値に密接には適合しませんが、正確な手続きは指定されていません。この時点での矛盾するサンプルのフィルタリングの適用はモデルが矛盾する訓練サンプルを受け取ることを総合的には妨げないこともここで注意するに値します : 次のステップはより多くの衝突を引き起こすデータを二値化します。

x_train_nocon, y_train_nocon = remove_contradicting(x_train_small, y_train)

Number of unique images: 10387
Number of 3s:  4961
Number of 6s:  5475
Number of contradictory images:  49

Initial number of examples:  12049
Remaining non-contradictory examples:  11520

1.3 データを量子回路としてエンコードする

量子コンピュータを使用して画像を処理するため、Farhi et al. は各ピクセルを量子ビットで表すことを提案しています、状態はピクセル値に依拠します。最初のステップは二値エンコーディングに変換することです。

THRESHOLD = 0.5

x_train_bin = np.array(x_train_nocon > THRESHOLD, dtype=np.float32)
x_test_bin = np.array(x_test_small > THRESHOLD, dtype=np.float32)

閾値を越える値を持つピクセルインデックスの量子ビットは $X$ ゲートを通して回転されます。

def convert_to_circuit(image):
    """Encode truncated classical image into quantum datapoint."""
    values = np.ndarray.flatten(image)
    qubits = cirq.GridQubit.rect(4, 4)
    circuit = cirq.Circuit()
    for i, value in enumerate(values):
        if value:
            circuit.append(cirq.X(qubits[i]))
    return circuit


x_train_circ = [convert_to_circuit(x) for x in x_train_bin]
x_test_circ = [convert_to_circuit(x) for x in x_test_bin]

最初のサンプルのために作成された回路がここにあります (回路図はゼロゲートを持つ量子ビットは表示しません) :

SVGCircuit(x_train_circ[0])

findfont: Font family ['Arial'] not found. Falling back to DejaVu Sans.

この回路を画像値が閾値を越えるインデックスと比較します :

bin_img = x_train_bin[0,:,:,0]
indices = np.array(np.where(bin_img)).T
indices

array([[2, 2],
       [3, 1]])

これらの Cirq 回路を tfq のための tensor に変換します :

x_train_tfcirc = tfq.convert_to_tensor(x_train_circ)
x_test_tfcirc = tfq.convert_to_tensor(x_test_circ)

2. 量子ニューラルネットワーク

画像を分類する量子回路構造についてのガイダンスは殆どありません。分類は読み出し量子ビットの期待値に基づきますので、Farhi et al. は 2 量子ビットゲートを使用することを提案しています、読み出し量子ビットは常にその上で動作します。これはピクセルに渡り小さな Unitary RNN を実行することに幾つかの点で類似しています。

2.1 モデル回路を構築する

この以下のサンプルはこの層化 (= layered) アプローチを示します。各層は同じゲートの n インスタンスを使用します、データ量子ビットの各々は読み出し量子ビット上で作用します。

これらのゲートの層を回路に追加する単純なクラスから始めます :

class CircuitLayerBuilder():
    def __init__(self, data_qubits, readout):
        self.data_qubits = data_qubits
        self.readout = readout
    
    def add_layer(self, circuit, gate, prefix):
        for i, qubit in enumerate(self.data_qubits):
            symbol = sympy.Symbol(prefix + '-' + str(i))
            circuit.append(gate(qubit, self.readout)**symbol)

サンプル回路層をそれがどのようなものか見るために構築します :

demo_builder = CircuitLayerBuilder(data_qubits = cirq.GridQubit.rect(4,1),
                                   readout=cirq.GridQubit(-1,-1))

circuit = cirq.Circuit()
demo_builder.add_layer(circuit, gate = cirq.XX, prefix='xx')
SVGCircuit(circuit)

今は 2-層化モデルを構築し、データ回路サイズに合わせ、そして準備と読み出し演算を含みます。

def create_quantum_model():
    """Create a QNN model circuit and readout operation to go along with it."""
    data_qubits = cirq.GridQubit.rect(4, 4)  # a 4x4 grid.
    readout = cirq.GridQubit(-1, -1)         # a single qubit at [-1,-1]
    circuit = cirq.Circuit()
    
    # Prepare the readout qubit.
    circuit.append(cirq.X(readout))
    circuit.append(cirq.H(readout))
    
    builder = CircuitLayerBuilder(
        data_qubits = data_qubits,
        readout=readout)

    # Then add layers (experiment by adding more).
    builder.add_layer(circuit, cirq.XX, "xx1")
    builder.add_layer(circuit, cirq.ZZ, "zz1")

    # Finally, prepare the readout qubit.
    circuit.append(cirq.H(readout))

    return circuit, cirq.Z(readout)

model_circuit, model_readout = create_quantum_model()

2.2 tfq-keras モデルでモデル回路をラップする

Keras モデルを量子コンポーネントで構築します。このモデルは「量子データ」が x_train_circ から供給されます、これは古典的データをエンコードします。それは量子データ上、モデル回路を訓練するためにパラメータ化された量子回路層 – tfq.layers.PQC を使用します。

これらの画像を分類するため、Farhi et al. はパラメータ化された回路の読み出し量子ビットの期待値を取ることを提案しました。期待値は 1 と -1 の間の値を返します。

# Build the Keras model.
model = tf.keras.Sequential([
    # The input is the data-circuit, encoded as a tf.string
    tf.keras.layers.Input(shape=(), dtype=tf.string),
    # The PQC layer returns the expected value of the readout gate, range [-1,1].
    tfq.layers.PQC(model_circuit, model_readout),
])

次に、compile メソッドを使用して、モデルへの訓練手続きを記述します。

期待される読み出しは範囲 [-1, 1] にありますので、hinge 損失の最適化はある程度自然な fit です。

★ Note: もう一つの妥当なアプローチは出力範囲を [0, 1] にシフトすることでしょう、そしてそれをモデルがクラス 3 に割り当てた確率として扱います。これは標準的な tf.losses.BinaryCrossentropy 損失で使用できるでしょう。

ここで hinge 損失を使用するには、2 つの小さい調節 (= adjustments) を作成する必要があります。最初にブーリアンからのラベル, y_train を hinge 損失に想定されるような、[-1, 1] に変換します。

y_train_hinge = 2.0*y_train-1.0
y_test_hinge = 2.0*y_test-1.0

2 番目に、カスタム hinge_accuracy メトリックを使用します、これは [-1, 1] を y_true ラベル引数として正しく処理します。tf.losses.BinaryAccuracy(threshold=0.0) は y_true にブーリアンであることを想定しますので、hinge 損失では使用できません。

def hinge_accuracy(y_true, y_pred):
    y_true = tf.squeeze(y_true) > 0.0
    y_pred = tf.squeeze(y_pred) > 0.0
    result = tf.cast(y_true == y_pred, tf.float32)

    return tf.reduce_mean(result)

model.compile(
    loss=tf.keras.losses.Hinge(),
    optimizer=tf.keras.optimizers.Adam(),
    metrics=[hinge_accuracy])

print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
pqc (PQC)                    (None, 1)                 32        
=================================================================
Total params: 32
Trainable params: 32
Non-trainable params: 0
_________________________________________________________________
None

2.3 量子モデルを訓練する

さてモデルを訓練します — これは 45 分ほどかかります。そんなに長く待つことを望まない場合には、データの小さいサブセットを使用します (下で、NUM_EXAMPLES=500 を設定します)。これは実際には訓練の間モデルの進捗に影響しません (それは 32 パラメータを持つだけで、これらを制約するためにそれほどデータを必要としません)。より少ないサンプルの使用は単に訓練を早期 (5 分) に終わらせますが、十分に長く実行すると検証ログで進展していくことを示します。

EPOCHS = 3
BATCH_SIZE = 32

NUM_EXAMPLES = len(x_train_tfcirc)

x_train_tfcirc_sub = x_train_tfcirc[:NUM_EXAMPLES]
y_train_hinge_sub = y_train_hinge[:NUM_EXAMPLES]

このモデルを収束まで訓練するとテストセット上で >85% 精度を獲得します。

qnn_history = model.fit(
      x_train_tfcirc_sub, y_train_hinge_sub,
      batch_size=32,
      epochs=EPOCHS,
      verbose=1,
      validation_data=(x_test_tfcirc, y_test_hinge))

qnn_results = model.evaluate(x_test_tfcirc, y_test)

Train on 11520 samples, validate on 1968 samples
Epoch 1/3
11520/11520 [==============================] - 404s 35ms/sample - loss: 1.0000 - hinge_accuracy: 0.4987 - val_loss: 0.9994 - val_hinge_accuracy: 0.6033
Epoch 2/3
11520/11520 [==============================] - 397s 34ms/sample - loss: 1.0000 - hinge_accuracy: 0.4977 - val_loss: 0.9996 - val_hinge_accuracy: 0.6069
Epoch 3/3
11520/11520 [==============================] - 397s 34ms/sample - loss: 1.0000 - hinge_accuracy: 0.5016 - val_loss: 0.9997 - val_hinge_accuracy: 0.5917
1968/1968 [==============================] - 3s 1ms/sample - loss: 0.9997 - hinge_accuracy: 0.5917

★ Note: 訓練精度はエポックに渡る平均を報告します。検証精度は各エポックの最後に評価されます。

3. 古典的ニューラルネットワーク

量子ニューラルネットワークがこの単純化された MNIST 問題のために動作する一方で、基本的な古典的ニューラルネットワークはこのタスク上で QNN を容易により優れたパフォーマンスを示します。シングルエポックの後、古典的ニューラルネットワークは取り置いたセット上で >98% 精度を獲得できます。

次のサンプルでは、古典的ニューラルネットワークは画像のサブサンプリングの代わりに 28×28 全体画像を使用して 3-6 分類問題のために使用されます。これは容易にテストセットの 100% 精度近くに収束します。

def create_classical_model():
    # A simple model based off LeNet from https://keras.io/examples/mnist_cnn/
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(32, [3, 3], activation='relu', input_shape=(28,28,1)))
    model.add(tf.keras.layers.Conv2D(64, [3, 3], activation='relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Dropout(0.25))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation='relu'))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(1))
    return model


model = create_classical_model()
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 12, 12, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               1179776   
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
=================================================================
Total params: 1,198,721
Trainable params: 1,198,721
Non-trainable params: 0

model.fit(x_train,
          y_train,
          batch_size=128,
          epochs=1,
          verbose=1,
          validation_data=(x_test, y_test))

cnn_results = model.evaluate(x_test, y_test)

Train on 12049 samples, validate on 1968 samples
12049/12049 [==============================] - 4s 318us/sample - loss: 0.0404 - accuracy: 0.9832 - val_loss: 0.0033 - val_accuracy: 0.9980
1968/1968 [==============================] - 0s 130us/sample - loss: 0.0033 - accuracy: 0.9980

上のモデルは 1.2M パラメータ近くを持ちます。より公平な比較のために、サブサンプリングされた画像上で、37-パラメータ・モデルを試します :

def create_fair_classical_model():
    # A simple model based off LeNet from https://keras.io/examples/mnist_cnn/
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(4,4,1)))
    model.add(tf.keras.layers.Dense(2, activation='relu'))
    model.add(tf.keras.layers.Dense(1))
    return model


model = create_fair_classical_model()
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 16)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 34        
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 3         
=================================================================
Total params: 37
Trainable params: 37
Non-trainable params: 0

model.fit(x_train_bin,
          y_train_nocon,
          batch_size=128,
          epochs=20,
          verbose=2,
          validation_data=(x_test_bin, y_test))

fair_nn_results = model.evaluate(x_test_bin, y_test)

Train on 11520 samples, validate on 1968 samples
Epoch 1/20
11520/11520 - 0s - loss: 0.7243 - accuracy: 0.5182 - val_loss: 0.6564 - val_accuracy: 0.5056
Epoch 2/20
11520/11520 - 0s - loss: 0.6264 - accuracy: 0.5395 - val_loss: 0.5690 - val_accuracy: 0.6174
Epoch 3/20
11520/11520 - 0s - loss: 0.5395 - accuracy: 0.7294 - val_loss: 0.4882 - val_accuracy: 0.7734
Epoch 4/20
11520/11520 - 0s - loss: 0.4624 - accuracy: 0.8238 - val_loss: 0.4203 - val_accuracy: 0.7947
Epoch 5/20
11520/11520 - 0s - loss: 0.3999 - accuracy: 0.8484 - val_loss: 0.3672 - val_accuracy: 0.8425
Epoch 6/20
11520/11520 - 0s - loss: 0.3522 - accuracy: 0.8635 - val_loss: 0.3277 - val_accuracy: 0.8491
Epoch 7/20
11520/11520 - 0s - loss: 0.3158 - accuracy: 0.8664 - val_loss: 0.2982 - val_accuracy: 0.8486
Epoch 8/20
11520/11520 - 0s - loss: 0.2885 - accuracy: 0.8914 - val_loss: 0.2767 - val_accuracy: 0.9070
Epoch 9/20
11520/11520 - 0s - loss: 0.2684 - accuracy: 0.9012 - val_loss: 0.2611 - val_accuracy: 0.9141
Epoch 10/20
11520/11520 - 0s - loss: 0.2536 - accuracy: 0.9059 - val_loss: 0.2497 - val_accuracy: 0.9141
Epoch 11/20
11520/11520 - 0s - loss: 0.2426 - accuracy: 0.9082 - val_loss: 0.2417 - val_accuracy: 0.9141
Epoch 12/20
11520/11520 - 0s - loss: 0.2343 - accuracy: 0.9099 - val_loss: 0.2352 - val_accuracy: 0.9141
Epoch 13/20
11520/11520 - 0s - loss: 0.2280 - accuracy: 0.9100 - val_loss: 0.2308 - val_accuracy: 0.9141
Epoch 14/20
11520/11520 - 0s - loss: 0.2231 - accuracy: 0.9100 - val_loss: 0.2271 - val_accuracy: 0.9151
Epoch 15/20
11520/11520 - 0s - loss: 0.2193 - accuracy: 0.9104 - val_loss: 0.2246 - val_accuracy: 0.9151
Epoch 16/20
11520/11520 - 0s - loss: 0.2163 - accuracy: 0.9107 - val_loss: 0.2226 - val_accuracy: 0.9151
Epoch 17/20
11520/11520 - 0s - loss: 0.2139 - accuracy: 0.9107 - val_loss: 0.2210 - val_accuracy: 0.9151
Epoch 18/20
11520/11520 - 0s - loss: 0.2120 - accuracy: 0.9108 - val_loss: 0.2198 - val_accuracy: 0.9151
Epoch 19/20
11520/11520 - 0s - loss: 0.2104 - accuracy: 0.9109 - val_loss: 0.2188 - val_accuracy: 0.9151
Epoch 20/20
11520/11520 - 0s - loss: 0.2091 - accuracy: 0.9109 - val_loss: 0.2181 - val_accuracy: 0.9151
1968/1968 [==============================] - 0s 25us/sample - loss: 0.2181 - accuracy: 0.9151

4. 比較

より高解像度入力とよりパワフルなモデルはこの問題を CNN にとって容易にします。一方で類似のパワー (~32 パラメータ) の古典的モデルはわずかな時間で同様の精度にまで訓練されます。いずれにせよ、古典的ニューラルネットワークは量子ニューラルネットワークのパフォーマンスを容易に越えます。古典的なデータについては、古典的ニューラルネットワークに打ち勝つことは困難です。

qnn_accuracy = qnn_results[1]
cnn_accuracy = cnn_results[1]
fair_nn_accuracy = fair_nn_results[1]

sns.barplot(["Quantum", "Classical, full", "Classical, fair"],
            [qnn_accuracy, cnn_accuracy, fair_nn_accuracy])

<matplotlib.axes._subplots.AxesSubplot at 0x7efe9c00d9b0>

以上

2020年3月
月	火	水	木	金	土	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31