TensorFlow Probability : Tutorials : TensorFlow 分布 shape の理解 (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 10/25/2018

* 本ページは、TensorFlow の本家サイトの TensorFlow Probability – Tutorials の次のページを翻訳した上で
適宜、補足説明したものです：

Understanding TensorFlow Distributions Shapes

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

TensorFlow Probability : TensorFlow 分布 shape を理解する

import collections

import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors

tfe = tf.contrib.eager
tfe.enable_eager_execution()

from __future__ import print_function

基本

TensorFlow 分布 shape に関連する 3 つの重要な概念があります :

事象 (= event) shape は分布からの単一のドローの shape を記述します; それは次元に渡り従属的かもしれません。スカラー分布については事象 shape は [] です。5-次元 MultivariateNormal については、事象 shape は [5] です。
バッチ shape は独立で、同一分布ではないドローを記述します, aka, 分布の「バッチ」です。
サンプル shape は分布族からのバッチの独立同分布なドローを記述します。

サンプル shape が sample か log_prob への特定の呼び出しに関係する一方で、事象 shape とバッチ shape は Distribution オブジェクトの特性です。

この notebook の目的は例を通してこれらの概念を示すことですので、これが直ちに明らかでなくても、don’t worry!

TensorFlow Eager 上のノート

notebook 全体は TensorFlow Eager を使用して書かれています。提示される概念のどれも Eager に依拠していません、けれども Eager では、分布のバッチと事象 shape は Python で Distribution オブジェクトが作成されたときに評価されます (そして従って知られます)。その一方でグラフ (non-Eager モード) では、グラフが実行されるまで事象とバッチ shape が未確定の分布を定義することが可能です。

スカラー分布

上で言及したように、Distribution オブジェクトは事象とバッチ shape を定義しました。分布を記述するユティリティから始めましょう :

def describe_distributions(distributions):
  print('\n'.join([str(d) for d in distributions]))

このセクションではスカラー分布を探究します: [] の事象 shape を持つ分布です。典型的な例は Poisson 分布で、rate により指定されます:

poisson_distributions = [
    tfd.Poisson(rate=1., name='One Poisson Scalar Batch'),
    tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons'),
    tfd.Poisson(rate=[[1., 10., 100.,], [2., 20., 200.]],
                name='Two-by-Three Poissons'),
    tfd.Poisson(rate=[1.], name='One Poisson Vector Batch'),
    tfd.Poisson(rate=[[1.]], name='One Poisson Expanded Batch')
]

describe_distributions(poisson_distributions)

tfp.distributions.Poisson("One Poisson Scalar Batch/", batch_shape=(), event_shape=(), dtype=float32)
tfp.distributions.Poisson("Three Poissons/", batch_shape=(3,), event_shape=(), dtype=float32)
tfp.distributions.Poisson("Two-by-Three Poissons/", batch_shape=(2, 3), event_shape=(), dtype=float32)
tfp.distributions.Poisson("One Poisson Vector Batch/", batch_shape=(1,), event_shape=(), dtype=float32)
tfp.distributions.Poisson("One Poisson Expanded Batch/", batch_shape=(1, 1), event_shape=(), dtype=float32)

Poisson 分布はスカラー分布ですので、その事象 shape は常に [] です。より多くの rate を指定する場合、これらはバッチ shape に出現します。例の最後のペアは興味深いです: 単一の rate しかありませんが、その rate は non-empty shape で numpy 配列に埋め込まれていますので、その shape はバッチ shape になります。

標準正規分布もまたスカラーです。その事象 shape はちょうど Poisson のように [] ですが、ブロードキャスティングの最初の例を見るためにそれで遊びます。Normal は loc と sacle パラメータで指定されます:

normal_distributions = [
    tfd.Normal(loc=0., scale=1., name='Standard'),
    tfd.Normal(loc=[0.], scale=1., name='Standard Vector Batch'),
    tfd.Normal(loc=[0., 1., 2., 3.], scale=1., name='Different Locs'),
    tfd.Normal(loc=[0., 1., 2., 3.], scale=[[1.], [5.]],
               name='Broadcasting Scale')
]

describe_distributions(normal_distributions)

tfp.distributions.Normal("Standard/", batch_shape=(), event_shape=(), dtype=float32)
tfp.distributions.Normal("Standard Vector Batch/", batch_shape=(1,), event_shape=(), dtype=float32)
tfp.distributions.Normal("Different Locs/", batch_shape=(4,), event_shape=(), dtype=float32)
tfp.distributions.Normal("Broadcasting Scale/", batch_shape=(2, 4), event_shape=(), dtype=float32)

上の興味深い例は Broadcasting Scale 分布です。loc パラメータは shape [4] を持ち、scale パラメータは shape [2, 1] を持ちます。Numpy ブロードキャスティング・ルールを使用すれば、バッチ shape は [2, 4] です。”Broadcasting Scale” 分布を定義するための同値の (しかし洗練されておらず非推奨な) 方法は :

describe_distributions(
    [tfd.Normal(loc=[[0., 1., 2., 3], [0., 1., 2., 3.]],
                scale=[[1., 1., 1., 1.], [5., 5., 5., 5.]])])

tfp.distributions.Normal("Normal/", batch_shape=(2, 4), event_shape=(), dtype=float32)

ブロードキャスティング記法が何故有用かを見て取れます、それはまた頭痛とバグの源ですが。

スカラー分布のサンプリング

分布でできる 2 つの主要なことがあります: それらからサンプリングできて log_prob を計算できます。最初にサンプリングを探究しましょう。基本的なルールは分布からサンプリングするとき、結果の Tensor は shape [sample_shape, batch_shape, event_shape] を持ちます、そこでは batch_shape と event_shape は Distribution オブジェクトから提供されて、sample_shape は sample への呼び出しから提供されます。スカラー分布については、event_shape = [] ですので、sample から返される Tensor は shape [sample_shape, batch_shape] を持ちます。試してみましょう :

def describe_sample_tensor_shape(sample_shape, distribution):
    print('Sample shape:', sample_shape)
    print('Returned sample tensor shape:',
          distribution.sample(sample_shape).shape)

def describe_sample_tensor_shapes(distributions, sample_shapes):
    started = False
    for distribution in distributions:
      print(distribution)
      for sample_shape in sample_shapes:
        describe_sample_tensor_shape(sample_shape, distribution)
      print()

sample_shapes = [1, 2, [1, 5], [3, 4, 5]]
describe_sample_tensor_shapes(poisson_distributions, sample_shapes)

tfp.distributions.Poisson("One Poisson Scalar Batch/", batch_shape=(), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1,)
Sample shape: 2
Returned sample tensor shape: (2,)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5)

tfp.distributions.Poisson("Three Poissons/", batch_shape=(3,), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 3)
Sample shape: 2
Returned sample tensor shape: (2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 3)

tfp.distributions.Poisson("Two-by-Three Poissons/", batch_shape=(2, 3), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

tfp.distributions.Poisson("One Poisson Vector Batch/", batch_shape=(1,), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 1)
Sample shape: 2
Returned sample tensor shape: (2, 1)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 1)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 1)

tfp.distributions.Poisson("One Poisson Expanded Batch/", batch_shape=(1, 1), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 1, 1)
Sample shape: 2
Returned sample tensor shape: (2, 1, 1)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 1, 1)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 1, 1)

describe_sample_tensor_shapes(normal_distributions, sample_shapes)

tfp.distributions.Normal("Standard/", batch_shape=(), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1,)
Sample shape: 2
Returned sample tensor shape: (2,)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5)

tfp.distributions.Normal("Standard Vector Batch/", batch_shape=(1,), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 1)
Sample shape: 2
Returned sample tensor shape: (2, 1)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 1)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 1)

tfp.distributions.Normal("Different Locs/", batch_shape=(4,), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 4)
Sample shape: 2
Returned sample tensor shape: (2, 4)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 4)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 4)

tfp.distributions.Normal("Broadcasting Scale/", batch_shape=(2, 4), event_shape=(), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 4)
Sample shape: 2
Returned sample tensor shape: (2, 2, 4)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 4)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 4)

sample について言うべきことはそれがおおよそ総てです: 返されるサンプル tensor は shape [sample_shape, batch_shape, event_shape] を持ちます。

スカラー分布のために log_prob を計算する

さて log_prob を見てみましょう、これは幾分扱いにくいです。log_prob は入力として (そこで分布のために log_prob を計算する) 位置を表わす (non-empty) tensor を取ります。最も straightforward なケースでは、この tensor は形式 [sample_shape, batch_shape, event_shape] の shape を持ち、そこでは batch_shape と event_shape は分布のバッチと事象 shape に一致します。スカラー分布については、event_shape = [] であることをもう一度思い出してください、従って入力 tensor は shape [sample_shape, batch_shape] を持ちます。この場合、shape [sample_shape, batch_shape] の tensor を取り戻します :

three_poissons = tfd.Poisson(rate=[1., 10., 100.], name='Three Poissons')
three_poissons

<tfp.distributions.Poisson 'Three Poissons/' batch_shape=(3,) event_shape=() dtype=float32>

three_poissons.log_prob([[1., 10., 100.], [100., 10., 1]])  # sample_shape is [2].

<tf.Tensor: id=594, shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -2.0785608,   -3.222351 ],
       [-364.73938  ,   -2.0785608,  -95.39483  ]], dtype=float32)>

three_poissons.log_prob([[[[1., 10., 100.], [100., 10., 1.]]]])  # sample_shape is [1, 1, 2].

<tf.Tensor: id=601, shape=(1, 1, 2, 3), dtype=float32, numpy=
array([[[[  -1.       ,   -2.0785608,   -3.222351 ],
         [-364.73938  ,   -2.0785608,  -95.39483  ]]]], dtype=float32)>

最初の例でどのように入力と出力が shape [2, 3] を持つか、そして 2 番目の例でそれらが shape [1, 1, 2, 3] を持つか注意してください。

言うべきことはそれが総てでしょう、もしそれがブロードキャスティングのためでないのであれば。ここにブロードキャスティングをひとたび考慮した場合のルールがあります。それを完全に一般化して記述してそしてスカラー分布のための単純化を記します :

n = len(batch_shape) + len(event_shape) と定義します。(スカラー分布については、len(event_shape)=0。)
入力 tensor t が n 次元よりも少ない場合、それが正確に n 次元を持つまでその shape を size 1 の次元を左に追加してパッドします。結果の tensor を t’ と呼びます。
t’ の n 右端次元を (そのために log_prob を計算している) 分布の [batch_shape, event_shape] に対してブロードキャストします。より詳細には: t’ が既に分布に一致している次元については、何もしません、そして t’ がシングルトンを持つような次元については、そのシングルトンを適切な回数複製 (= replicate) します。他の状況は総てエラーです。(スカラー分布については、batch_shape に対してブロードキャストするだけです、何故ならば event_shape = [] だからです。)
今は最後に log_prob を計算することができます。結果の tensor は shape [sample_shape, batch_shape] を持ち、そこでは sample_shape は t か t’ の任意の次元として n-右端次元の左側に定義されます : sample_shape = shape(t)[:-n]。(原文: The resulting tensor will have shape [sample_shape, batch_shape], where sample_shape is defined to be any dimensions of t or t’ to the left of the n-rightmost dimensions: sample_shape = shape(t)[:-n].)

それが意味するところを知らないのであればこれは混乱しますので、幾つかの例を動作させましょう :

three_poissons.log_prob([10.])

<tf.Tensor: id=608, shape=(3,), dtype=float32, numpy=array([-16.104412 ,  -2.0785608, -69.052704 ], dtype=float32)>

tensor [10.] (with shape [1]) は 3 の batch_shape に渡りブロードキャストされますので、3 つ総ての Poisson の対数確率を値 10 で評価します。

three_poissons.log_prob([[[1.], [10.]], [[100.], [1000.]]])

<tf.Tensor: id=615, shape=(2, 2, 3), dtype=float32, numpy=
array([[[-1.0000000e+00, -7.6974149e+00, -9.5394829e+01],
        [-1.6104412e+01, -2.0785608e+00, -6.9052704e+01]],

       [[-3.6473938e+02, -1.4348087e+02, -3.2223511e+00],
        [-5.9131279e+03, -3.6195427e+03, -1.4069575e+03]]], dtype=float32)>

上の例では、入力 tensor は shape [2, 2, 1] を持ち、一方で distributions オブジェクトは 3 のバッチ shape を持ちます。従って [2, 2] サンプル次元の各々について、提供される単一の値は 3 の Poisson の各々へのブロードキャストを得ます。

それを考える多分有用な方法は: three_poissons は batch_shape = [2, 3] を持ちますので、log_prob への呼び出しは最後の次元が 1 か 3 である Tensor を取らなければなりません; 他の総てはエラーです。(numpy ブロードキャスティング・ルールはスカラーの特別なケースを shape [1] の Tensor とまったく同値であるとして扱います。)

batch_shape = [2, 3] を持つより複雑な Poisson 分布で遊ぶことにより私達の技術をテストしましょう :

poisson_2_by_3 = tfd.Poisson(
    rate=[[1., 10., 100.,], [2., 20., 200.]],
    name='Two-by-Three Poissons')

poisson_2_by_3.log_prob(1.)

<tf.Tensor: id=623, shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [  -1.3068528,  -17.004269 , -194.70168  ]], dtype=float32)>

poisson_2_by_3.log_prob([1.])  # Exactly equivalent to above, demonstrating the scalar special case.

<tf.Tensor: id=630, shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [  -1.3068528,  -17.004269 , -194.70168  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 1., 1.], [1., 1., 1.]])  # Another way to write the same thing. No broadcasting.

<tf.Tensor: id=637, shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [  -1.3068528,  -17.004269 , -194.70168  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 10., 100.]])  # Input is [1, 3] broadcast to [2, 3].

<tf.Tensor: id=644, shape=(2, 3), dtype=float32, numpy=
array([[ -1.       ,  -2.0785608,  -3.222351 ],
       [ -1.3068528,  -5.14709  , -33.907654 ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 10., 100.], [1., 10., 100.]])  # Equivalent to above. No broadcasting.

<tf.Tensor: id=651, shape=(2, 3), dtype=float32, numpy=
array([[ -1.       ,  -2.0785608,  -3.222351 ],
       [ -1.3068528,  -5.14709  , -33.907654 ]], dtype=float32)>

poisson_2_by_3.log_prob([[1., 1., 1.], [2., 2., 2.]])  # No broadcasting.

<tf.Tensor: id=658, shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [  -1.3068528,  -14.701683 , -190.09651  ]], dtype=float32)>

poisson_2_by_3.log_prob([[1.], [2.]])  # Equivalent to above. Input shape [2, 1] broadcast to [2, 3].

<tf.Tensor: id=665, shape=(2, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [  -1.3068528,  -14.701683 , -190.09651  ]], dtype=float32)>

上の例はバッチに渡るブロードキャスティングを伴いましたが、サンプル shape は empty でした。値のコレクションがあり、バッチの各ポイントでの各値の対数確率を得たいとします。それを手動で行えます :

poisson_2_by_3.log_prob([[[1., 1., 1.], [1., 1., 1.]], [[2., 2., 2.], [2., 2., 2.]]])  # Input shape [2, 2, 3].

<tf.Tensor: id=672, shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39483  ],
        [  -1.3068528,  -17.004269 , -194.70168  ]],

       [[  -1.6931472,   -6.087977 ,  -91.4828   ],
        [  -1.3068528,  -14.701683 , -190.09651  ]]], dtype=float32)>

あるいはブロードキャスティングに最後のバッチ次元を処理させることもできます :

poisson_2_by_3.log_prob([[[1.], [1.]], [[2.], [2.]]])  # Input shape [2, 2, 1].

<tf.Tensor: id=679, shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39483  ],
        [  -1.3068528,  -17.004269 , -194.70168  ]],

       [[  -1.6931472,   -6.087977 ,  -91.4828   ],
        [  -1.3068528,  -14.701683 , -190.09651  ]]], dtype=float32)>

ブロードキャスティングにちょうど最初のバッチ次元を処理させることもまたできます (多分幾分自然ではないですが) :

poisson_2_by_3.log_prob([[[1., 1., 1.]], [[2., 2., 2.]]])  # Input shape [2, 1, 3].

<tf.Tensor: id=686, shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39483  ],
        [  -1.3068528,  -17.004269 , -194.70168  ]],

       [[  -1.6931472,   -6.087977 ,  -91.4828   ],
        [  -1.3068528,  -14.701683 , -190.09651  ]]], dtype=float32)>

あるいはブロードキャスティングに両者のバッチ次元を処理させられるでしょう :

poisson_2_by_3.log_prob([[[1.]], [[2.]]])  # Input shape [2, 1, 1].

<tf.Tensor: id=693, shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39483  ],
        [  -1.3068528,  -17.004269 , -194.70168  ]],

       [[  -1.6931472,   -6.087977 ,  -91.4828   ],
        [  -1.3068528,  -14.701683 , -190.09651  ]]], dtype=float32)>

望む 2 つの値だけを持ったとき上は素晴らしく動作しましたが、総てのバッチポイントで評価することを望む値の長いリストを持つと仮定します。そのためには、shape の右側に size 1 の特別な次元を追加する、次の記法が非常に有用です :

poisson_2_by_3.log_prob(tf.constant([1., 2.])[..., tf.newaxis, tf.newaxis])

<tf.Tensor: id=704, shape=(2, 2, 3), dtype=float32, numpy=
array([[[  -1.       ,   -7.697415 ,  -95.39483  ],
        [  -1.3068528,  -17.004269 , -194.70168  ]],

       [[  -1.6931472,   -6.087977 ,  -91.4828   ],
        [  -1.3068528,  -14.701683 , -190.09651  ]]], dtype=float32)>

これは (知るに値する) strided slice 記法のインスタンスです。

完全性のために three_poissons に戻ります、同じ例は次のように見えます :

three_poissons.log_prob([[1.], [10.], [50.], [100.]])

<tf.Tensor: id=711, shape=(4, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [ -16.104412 ,   -2.0785608,  -69.052704 ],
       [-149.47777  ,  -43.34851  ,  -18.219254 ],
       [-364.73938  , -143.48087  ,   -3.222351 ]], dtype=float32)>

three_poissons.log_prob(tf.constant([1., 10., 50., 100.])[..., tf.newaxis])  # Equivalent to above.

<tf.Tensor: id=722, shape=(4, 3), dtype=float32, numpy=
array([[  -1.       ,   -7.697415 ,  -95.39483  ],
       [ -16.104412 ,   -2.0785608,  -69.052704 ],
       [-149.47777  ,  -43.34851  ,  -18.219254 ],
       [-364.73938  , -143.48087  ,   -3.222351 ]], dtype=float32)>

多変量分布

さて多変量分布に向かいます、これは non-empty 事象 shape を持ちます。多変量分布を見てみましょう。

multinomial_distributions = [
    # Multinomial is a vector-valued distribution: if we have k classes,
    # an individual sample from the distribution has k values in it, so the
    # event_shape is `[k]`.
    tfd.Multinomial(total_count=100., probs=[.5, .4, .1],
                    name='One Multinomial'),
    tfd.Multinomial(total_count=[100., 1000.], probs=[.5, .4, .1],
                    name='Two Multinomials Same Probs'),
    tfd.Multinomial(total_count=100., probs=[[.5, .4, .1], [.1, .2, .7]],
                    name='Two Multinomials Same Counts'),
    tfd.Multinomial(total_count=[100., 1000.],
                    probs=[[.5, .4, .1], [.1, .2, .7]],
                    name='Two Multinomials Different Everything')

]

describe_distributions(multinomial_distributions)

tfp.distributions.Multinomial("One Multinomial/", batch_shape=(), event_shape=(3,), dtype=float32)
tfp.distributions.Multinomial("Two Multinomials Same Probs/", batch_shape=(2,), event_shape=(3,), dtype=float32)
tfp.distributions.Multinomial("Two Multinomials Same Counts/", batch_shape=(2,), event_shape=(3,), dtype=float32)
tfp.distributions.Multinomial("Two Multinomials Different Everything/", batch_shape=(2,), event_shape=(3,), dtype=float32)

最後の 3 つの例で、batch_shape は常に [2] ですが、どちらかが共有された total_count か共有された probs を持つ (またはどちらも持たない) ようにブロードキャスティングをどのように使用できるかに注意してください、何故ならばそれらは同じ shape を持つように内部でブロードキャストしています。

既に私達が知っていることを考えれば、サンプリングは straightforward です :

describe_sample_tensor_shapes(multinomial_distributions, sample_shapes)

tfp.distributions.Multinomial("One Multinomial/", batch_shape=(), event_shape=(3,), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 3)
Sample shape: 2
Returned sample tensor shape: (2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 3)

tfp.distributions.Multinomial("Two Multinomials Same Probs/", batch_shape=(2,), event_shape=(3,), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

tfp.distributions.Multinomial("Two Multinomials Same Counts/", batch_shape=(2,), event_shape=(3,), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

tfp.distributions.Multinomial("Two Multinomials Different Everything/", batch_shape=(2,), event_shape=(3,), dtype=float32)
Sample shape: 1
Returned sample tensor shape: (1, 2, 3)
Sample shape: 2
Returned sample tensor shape: (2, 2, 3)
Sample shape: [1, 5]
Returned sample tensor shape: (1, 5, 2, 3)
Sample shape: [3, 4, 5]
Returned sample tensor shape: (3, 4, 5, 2, 3)

対数確率の計算も等しく straightforward です。対角多変量正規分布の例を動かしてみましょう。(多変量はそれほど都合よくはブロードキャストされません、何故ならば counts と probabilities 上の制約はブロードキャスティングがしばしば承認しがたい値を生成することを意味するからです。) 同じ mean しかし異なる scale (標準偏差) の 2 つの 3-次元分布のバッチを使用します :

two_multivariate_normals = tfd.MultivariateNormalDiag(loc=[1., 2., 3.], scale_identity_multiplier=[1., 2.])
two_multivariate_normals

<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag/' batch_shape=(2,) event_shape=(3,) dtype=float32>

(scale が multiples of the identity (単位行列の倍数) であるような分布を使用しましたが、これは制限ではありません; scale_identity_multiplier の代わりに scale を渡せるでしょう。)

さて各バッチポイントの対数確率をその mean と shifted mean で評価しましょう :

two_multivariate_normals.log_prob([[[1., 2., 3.]], [[3., 4., 5.]]])  # Input has shape [2,1,3].

<tf.Tensor: id=2394, shape=(2, 2), dtype=float32, numpy=
array([[-2.7568154, -4.836257 ],
       [-8.756816 , -6.336257 ]], dtype=float32)>

正確に等価に、constant の中 (= middle) に特別な shape=1 次元を挿入するために https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/strided-slice を使用できます :

two_multivariate_normals.log_prob(
    tf.constant([[1., 2., 3.], [3., 4., 5.]])[:, tf.newaxis, :])  # Equivalent to above.

<tf.Tensor: id=2475, shape=(2, 2), dtype=float32, numpy=
array([[-2.7568154, -4.836257 ],
       [-8.756816 , -6.336257 ]], dtype=float32)>

他方、extra 次元を挿入しないのであれば、[1., 2., 3.] を最初のバッチポイントに渡して [3., 4., 5.] を 2 番目に渡します :

two_multivariate_normals.log_prob(tf.constant([[1., 2., 3.], [3., 4., 5.]]))

<tf.Tensor: id=2552, shape=(2,), dtype=float32, numpy=array([-2.7568154, -6.336257 ], dtype=float32)>

Shape 操作テクニック

Reshape Bijector

分布の event_shape を reshape するために Reshape bijector が使用できます。例を見てみましょう :

six_way_multinomial = tfd.Multinomial(total_count=1000., probs=[.3, .25, .2, .15, .08, .02])
six_way_multinomial

<tfp.distributions.Multinomial 'Multinomial/' batch_shape=() event_shape=(6,) dtype=float32>

[6] の事象 shape で多変量分布を作成しました。Reshape Bijector はこれを [2, 3] の事象 shape を持つ分布として扱うことを可能にします。

Bijector は ${\mathbb R}^n$ の開部分集合 (= open subset) の微分可能な、one-to-one 関数を表します。Bijector は TransformedDistribution とともに使用されます、これは基底分布 $p(x)$ と ($Y = g(X)$ を表わす) Bijector によって分布 $p(y)$ をモデル化します。実際に見てみましょう :

transformed_multinomial = tfd.TransformedDistribution(
    distribution=six_way_multinomial,
    bijector=tfb.Reshape(event_shape_out=[2, 3]))
transformed_multinomial

<tfp.distributions.TransformedDistribution 'reshapeMultinomial/' batch_shape=() event_shape=(2, 3) dtype=float32>

six_way_multinomial.log_prob([500., 100., 100., 150., 100., 50.])

<tf.Tensor: id=2628, shape=(), dtype=float32, numpy=-178.21973>

transformed_multinomial.log_prob([[500., 100., 100.], [150., 100., 50.]])

<tf.Tensor: id=2703, shape=(), dtype=float32, numpy=-178.21973>

これが Reshape bijector のできる唯一のことです: それは事象次元をバッチ次元に変更することはできません or vice-versa。

Independent 分布

Independent 分布は独立で、必ずしも同一でない分布 (aka 分布のバッチ)を単一の分布として扱うために使用されます。より簡潔には、batch_shape の次元を event_shape の次元に変換することを可能にします。例で示します :

two_by_five_bernoulli = tfd.Bernoulli(
    probs=[[.05, .1, .15, .2, .25], [.3, .35, .4, .45, .5]],
    name="Two By Five Bernoulli")
two_by_five_bernoulli

<tfp.distributions.Bernoulli 'Two By Five Bernoulli/' batch_shape=(2, 5) event_shape=() dtype=int32>

これを表の関連する確率を持つコインの 2 by 5 配列として考えることができます。1 と 0 の特定の、任意のセットの確率を評価しましょう :

pattern = [[1., 0., 0., 1., 0.], [0., 0., 1., 1., 1.]]
two_by_five_bernoulli.log_prob(pattern)

<tf.Tensor: id=2722, shape=(2, 5), dtype=float32, numpy=
array([[-2.9957323 , -0.10536051, -0.16251893, -1.609438  , -0.28768206],
       [-0.35667494, -0.43078294, -0.9162907 , -0.7985077 , -0.6931472 ]],
      dtype=float32)>

これを 2 つの異なる「5 つの Bernoulli のセット」に変えるために Independent を使用することができます、与えられたパターンで現れるコイン投げの「行 (= row)」を単一の結果として考えたい場合にこれは有用です :

two_sets_of_five = tfd.Independent(
    distribution=two_by_five_bernoulli,
    reinterpreted_batch_ndims=1,
    name="Two Sets Of Five")
two_sets_of_five

<tfp.distributions.Independent 'Two Sets Of Five/' batch_shape=(2,) event_shape=(5,) dtype=int32>

数学的には、5 つの各「セット」の対数確率をセットの 5 つの「独立した」コイン投げの対数確率を合計することで計算しています、それは分布がその名前を得るところです :

two_sets_of_five.log_prob(pattern)

<tf.Tensor: id=2739, shape=(2,), dtype=float32, numpy=array([-5.160732 , -3.1954036], dtype=float32)>

さらに進んでそして (そこでは個々の事象は 2-by-5 Bernoulli 分布のセットであるような) 分布を作成するために Independent を使用できます :

one_set_of_two_by_five = tfd.Independent(
    distribution=two_by_five_bernoulli, reinterpreted_batch_ndims=2,
    name="One Set Of Two By Five")
one_set_of_two_by_five.log_prob(pattern)

<tf.Tensor: id=2756, shape=(), dtype=float32, numpy=-8.356134>

注目すべき点はサンプルの観点から、Independent の使用は何も変更しないことです :

describe_sample_tensor_shapes(
    [two_by_five_bernoulli,
     two_sets_of_five,
     one_set_of_two_by_five],
    [[3, 5]])

tfp.distributions.Bernoulli("Two By Five Bernoulli/", batch_shape=(2, 5), event_shape=(), dtype=int32)
Sample shape: [3, 5]
Returned sample tensor shape: (3, 5, 2, 5)

tfp.distributions.Independent("Two Sets Of Five/", batch_shape=(2,), event_shape=(5,), dtype=int32)
Sample shape: [3, 5]
Returned sample tensor shape: (3, 5, 2, 5)

tfp.distributions.Independent("One Set Of Two By Five/", batch_shape=(), event_shape=(2, 5), dtype=int32)
Sample shape: [3, 5]
Returned sample tensor shape: (3, 5, 2, 5)

課題: As a parting exercise for the reader, we suggest considering the differences and similarities between a vector batch of Normal distributions and a MultivariateNormalDiag distribution from a sampling and log probability perspective. How can we use Independent to construct a MultivariateNormalDiag from a batch of Normals? (Note that MultivariateNormalDiag is not actually implemented this way.)

以上

2018年10月
月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31