TensorFlow 2.4 : ガイド : 基本 – Ragged Tensor パート I (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 01/04/2021

* 本ページは、TensorFlow org サイトの Guide – TensorFlow Basics の以下のページの 前半部 を翻訳した上で
適宜、補足説明したものです：

Ragged tensors

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

★ 無料セミナー実施中 ★ クラスキャット主催人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社公式 Web サイトでご確認頂けます。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。

クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :

人工知能研究開発支援	人工知能研修サービス	テレワーク & オンライン授業を支援
PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション

E-Mail：sales-info@classcat.com ; WebSite: https://www.classcat.com/

Facebook: https://www.facebook.com/ClassCatJP/

ガイド : 基本 – Ragged Tensor パート I

API ドキュメント: tf.RaggedTensor tf.ragged

セットアップ

import math
import tensorflow as tf

概要

貴方のデータは多くの shape で出現します ; 貴方の tensor もそのはずです。Ragged (不規則な) tensor はネストされた可変長リストの TensorFlow の等値のものです。それらは以下を含む、一様ではない (= non-uniform) shape を持つデータをストアして処理することを容易にします :

映画の役者のセットのような、可変長特徴。
センテンスやビデオ・クリップのような、可変長シーケンシャル入力のバッチ。
セクション、パラグラフ、センテンスと単語に部分分割できるテキストドキュメントのような、階層的入力。
プロトコルバッファのような、構造化入力の個々のフィールド。

ragged tensor で何ができるか

ragged tensor は (tf.add と tf.reduce_mean のような) 数学演算、(tf.concat と tf.tile のような) 配列演算、(tf.substr のような) 文字列操作 ops、(tf.while_loop と tf.map_fn のような) 制御フロー演算、そしてその他多くを含む、百以上の TensorFlow 演算によりサポートされます :

digits = tf.ragged.constant([[3, 1, 4, 1], [], [5, 9, 2], [6], []])
words = tf.ragged.constant([["So", "long"], ["thanks", "for", "all", "the", "fish"]])
print(tf.add(digits, 3))
print(tf.reduce_mean(digits, axis=1))
print(tf.concat([digits, [[5, 3]]], axis=0))
print(tf.tile(digits, [1, 2]))
print(tf.strings.substr(words, 0, 2))
print(tf.map_fn(tf.math.square, digits))

<tf.RaggedTensor [[6, 4, 7, 4], [], [8, 12, 5], [9], []]>
tf.Tensor([2.25              nan 5.33333333 6.                nan], shape=(5,), dtype=float64)
<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9, 2], [6], [], [5, 3]]>
<tf.RaggedTensor [[3, 1, 4, 1, 3, 1, 4, 1], [], [5, 9, 2, 5, 9, 2], [6, 6], []]>
<tf.RaggedTensor [[b'So', b'lo'], [b'th', b'fo', b'al', b'th', b'fi']]>
<tf.RaggedTensor [[9, 1, 16, 1], [], [25, 81, 4], [36], []]>

factory メソッド、conversion メソッドと値マッピング演算を含む、ragged tensor に固有の数多くのメソッドと演算もあります。
サポートされる ops のリストについては、tf.ragged パッケージ・ドキュメント を見てください。

ragged tensor は Keras, データセット, tf.function, SavedModel そして tf.Example を含む、多くの TensorFlow API によりサポートされます。より多くの情報については、下の TensorFlow API のセクションを見てください。

通常の tensor と同様に、ragged tensor の特定のスライスにアクセスするために Python-スタイルのインデキシングを使用できます。より多くの情報については、下の インデキシング のセクションを見てください。

print(digits[0])       # First row

tf.Tensor([3 1 4 1], shape=(4,), dtype=int32)

print(digits[:, :2])   # First two values in each row.

<tf.RaggedTensor [[3, 1], [], [5, 9], [6], []]>

print(digits[:, -2:])  # Last two values in each row.

<tf.RaggedTensor [[4, 1], [], [9, 2], [6], []]>

そしてちょうど通常の tensor のように、要素単位の演算を遂行するために Python 算術と比較演算子を使用できます。より多くの情報については、下の Overloaded Operators のセクションを見てください。

print(digits + 3)

<tf.RaggedTensor [[6, 4, 7, 4], [], [8, 12, 5], [9], []]>

print(digits + tf.ragged.constant([[1, 2, 3, 4], [], [5, 6, 7], [8], []]))

<tf.RaggedTensor [[4, 3, 7, 5], [], [10, 15, 9], [14], []]>

RaggedTensor の値に要素単位の変換を遂行する必要がある場合、tf.ragged.map_flat_values を使用できます、これは関数プラス 1 つまたはそれ以上の引数を取り、そして RaggedTensor の値を変換するために関数を適用します。

times_two_plus_one = lambda x: x * 2 + 1
print(tf.ragged.map_flat_values(times_two_plus_one, digits))

<tf.RaggedTensor [[7, 3, 9, 3], [], [11, 19, 5], [13], []]>

ragged tensor はネストされた Python リストと numpy 配列に変換できます :

digits.to_list()

[[3, 1, 4, 1], [], [5, 9, 2], [6], []]

digits.numpy()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/ragged/ragged_tensor.py:2012: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return np.array(rows)
array([array([3, 1, 4, 1], dtype=int32), array([], dtype=int32),
       array([5, 9, 2], dtype=int32), array([6], dtype=int32),
       array([], dtype=int32)], dtype=object)

ragged tensor を構築する

ragged tensor を構築する最も単純な方法は tf.ragged.constant を使用することです、これは与えられたネストされた Python リストや numpy 配列に対応する RaggedTensor を構築します :

sentences = tf.ragged.constant([
    ["Let's", "build", "some", "ragged", "tensors", "!"],
    ["We", "can", "use", "tf.ragged.constant", "."]])
print(sentences)

<tf.RaggedTensor [[b"Let's", b'build', b'some', b'ragged', b'tensors', b'!'], [b'We', b'can', b'use', b'tf.ragged.constant', b'.']]>

paragraphs = tf.ragged.constant([
    [['I', 'have', 'a', 'cat'], ['His', 'name', 'is', 'Mat']],
    [['Do', 'you', 'want', 'to', 'come', 'visit'], ["I'm", 'free', 'tomorrow']],
])
print(paragraphs)

<tf.RaggedTensor [[[b'I', b'have', b'a', b'cat'], [b'His', b'name', b'is', b'Mat']], [[b'Do', b'you', b'want', b'to', b'come', b'visit'], [b"I'm", b'free', b'tomorrow']]]>

tf.RaggedTensor.from_value_rowids, tf.RaggedTensor.from_row_lengths と tf.RaggedTensor.from_row_splits のような factory classmethod を使用して、ragged tensor はまた (それらの値がどのように行に分割されるべきかを示す) 行分割 (= row-partitioning) tensor を伴う flat 値 tensor をペアリングすることにより構築することもできます。

tf.RaggedTensor.from_value_rowids

各値がどの行に属するかを知る場合、value_rowids 行分割 tensor を使用して RaggedTensor を構築できます :

print(tf.RaggedTensor.from_value_rowids(
    values=[3, 1, 4, 1, 5, 9, 2],
    value_rowids=[0, 0, 0, 0, 2, 2, 3]))

<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9], [2]]>

tf.RaggedTensor.from_row_lengths

各行がどのくらい長いかを知る場合、row_lengths 行分割 tensor を使用できます :

print(tf.RaggedTensor.from_row_lengths(
    values=[3, 1, 4, 1, 5, 9, 2],
    row_lengths=[4, 0, 2, 1]))

<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9], [2]]>

tf.RaggedTensor.from_row_splits

各行がどこで開始して終了するかのインデックスを知る場合、row_splits 行分割 tensor を使用できます :

print(tf.RaggedTensor.from_row_splits(
    values=[3, 1, 4, 1, 5, 9, 2],
    row_splits=[0, 4, 4, 6, 7]))

<tf.RaggedTensor [[3, 1, 4, 1], [], [5, 9], [2]]>

factory メソッドの完全なリストについては tf.RaggedTensor クラス・ドキュメントを見てください。

Note: デフォルトで、これらの factory メソッドは行分割 tensor が well-formed で値の数と一貫しているかのアサーションを追加します。入力が well-formed で矛盾していないことを貴方が保証できるのであれば、これらのチェックをスキップするために validate=False パラメータが使用できます。

ragged tensor にストアできるもの

通常の Tensor と同様に、RaggedTensor の値は総て同じタイプを持たなければなりません ; そして値は総て同じネスティング depth (tensor のランク) になければなりません :

print(tf.ragged.constant([["Hi"], ["How", "are", "you"]]))  # ok: type=string, rank=2

<tf.RaggedTensor [[b'Hi'], [b'How', b'are', b'you']]>

print(tf.ragged.constant([[[1, 2], [3]], [[4, 5]]]))        # ok: type=int32, rank=3

<tf.RaggedTensor [[[1, 2], [3]], [[4, 5]]]>

try:
  tf.ragged.constant([["one", "two"], [3, 4]])              # bad: multiple types
except ValueError as exception:
  print(exception)

Can't convert Python sequence with mixed types to Tensor.

try:
  tf.ragged.constant(["A", ["B", "C"]])                     # bad: multiple nesting depths
except ValueError as exception:
  print(exception)

all scalar values must have the same nesting depth

サンプル・ユースケース

以下のサンプルは、各センテンスの最初と最後のための特殊マーカーを使用して、可変長問合わせのバッチために unigram と bigram 埋め込みを構築して結合するために RaggedTensor がどのように利用できるかを実演します。このサンプルで使用される ops のより多くの詳細については、tf.ragged パッケージ・ドキュメントを見てください。

queries = tf.ragged.constant([['Who', 'is', 'Dan', 'Smith'],
                              ['Pause'],
                              ['Will', 'it', 'rain', 'later', 'today']])

# Create an embedding table.
num_buckets = 1024
embedding_size = 4
embedding_table = tf.Variable(
    tf.random.truncated_normal([num_buckets, embedding_size],
                       stddev=1.0 / math.sqrt(embedding_size)))

# Look up the embedding for each word.
word_buckets = tf.strings.to_hash_bucket_fast(queries, num_buckets)
word_embeddings = tf.nn.embedding_lookup(embedding_table, word_buckets)     # ①

# Add markers to the beginning and end of each sentence.
marker = tf.fill([queries.nrows(), 1], '#')
padded = tf.concat([marker, queries, marker], axis=1)                       # ②

# Build word bigrams & look up embeddings.
bigrams = tf.strings.join([padded[:, :-1], padded[:, 1:]], separator='+')   # ③

bigram_buckets = tf.strings.to_hash_bucket_fast(bigrams, num_buckets)
bigram_embeddings = tf.nn.embedding_lookup(embedding_table, bigram_buckets) # ④

# Find the average embedding for each sentence
all_embeddings = tf.concat([word_embeddings, bigram_embeddings], axis=1)    # ⑤
avg_embedding = tf.reduce_mean(all_embeddings, axis=1)                      # ⑥
print(avg_embedding)

tf.Tensor(
[[ 0.05365231 -0.0595057   0.12319323  0.10320764]
 [ 0.19690476 -0.35085988  0.34308517  0.3926835 ]
 [ 0.2679962  -0.1582186   0.15801464  0.05089492]], shape=(3, 4), dtype=float32)

ragged そして一様な (= uniform) 次元

ragged (= 不規則な) 次元はそのスライスが異なる長さを持つかもしれません。例えば、rt=[[3, 1, 4, 1], [], [5, 9, 2], [6], []] の内側の (カラム) 次元は ragged です、何故ならばカラムスライス (rt[0, :], …, rt[4, :]) は異なる長さを持つからです。そのスライスが総て同じ長さを持つ次元は一様 (= uniform) 次元と呼ばれます。

ragged tensor の最外部の次元は常に一様です、何故ならばそれは単一スライスから成るからです (そして従ってスライス長が異なる可能性はありません)。残りの次元は不規則か一様のいずれかであるかもしれません。例えば、shape [num_sentences, (num_words), embedding_size] を持つ ragged tensor を使用してセンテンスのバッチで各単語のための単語埋め込みをストアするかもしれません、そこでは (num_words) 回りの丸括弧は次元が不規則であることを示しています。

ragged tensor は複数の ragged 次元を持つかもしれません。例えば、shape [num_documents, (num_paragraphs), (num_sentences), (num_words)] (ここでは再び丸括弧は ragged 次元を示すために使用されます) を持つ tensor を使用して構造化テキスト・ドキュメントのバッチをストアできるでしょう。

tf.Tensor と同様に、ragged tensor の ランク はその次元の総数です (ragged と一様次元の両者を含む)。潜在的に不規則な tensor は tf.Tensor か tf.RaggedTensor のいずれかであるかもしれない値です。

RaggedTensor の shape を記述するとき、ragged 次元は慣習的にそれらを丸括弧で囲むことにより示されます。例えば、上で見たように、センテンスのバッチの各単語のための単語埋め込みをストアする 3-D RaggedTensor の shape は [num_sentences, (num_words), embedding_size] として書くことができます。

RaggedTensor.shape 属性は ragged tensor のための tf.TensorShape を返します、そこでは ragged 次元はサイズ None を持ちます :

tf.ragged.constant([["Hi"], ["How", "are", "you"]]).shape

TensorShape([2, None])

メソッド tf.RaggedTensor.bounding_shape は与えられた RaggedTensor のためのタイトな境界 shape を見つけるために利用できます :

print(tf.ragged.constant([["Hi"], ["How", "are", "you"]]).bounding_shape())

tf.Tensor([2 3], shape=(2,), dtype=int64)

Ragged vs. スパース

ragged tensor はスパース tensor のタイプとして考えるべきではありません。特に、スパース tensor は tf.Tensor のための効率的なエンコーディングで、コンパクト形式で同じデータをモデル化します ;しかし ragged tensor は tf.Tensor の拡張であり、これはデータの拡張したクラスをモデル化します。この違いは演算を定義するとき重要です :

スパース or dense tensor に op を適用することは常に同じ結果を与えるはずです。
ragged or スパース tensor に op を適用することは異なる結果を与えるかもしれません。

例示的なサンプルとして、concat, stack, and tile のような配列演算が ragged vs. スパース tensor のためにどのように定義されるかを考えます。ragged tensor の連結は各行を結合された長さを持つ単一の行を形成するために結び合わせます :

ragged_x = tf.ragged.constant([["John"], ["a", "big", "dog"], ["my", "cat"]])
ragged_y = tf.ragged.constant([["fell", "asleep"], ["barked"], ["is", "fuzzy"]])
print(tf.concat([ragged_x, ragged_y], axis=1))

<tf.RaggedTensor [[b'John', b'fell', b'asleep'], [b'a', b'big', b'dog', b'barked'], [b'my', b'cat', b'is', b'fuzzy']]>

しかしスパース tensor の連結は、次のサンプルにより例示されれうように対応する dense tensor の連結と同値です (ここで Ø は欠落値を示します) :

sparse_x = ragged_x.to_sparse()
sparse_y = ragged_y.to_sparse()
sparse_result = tf.sparse.concat(sp_inputs=[sparse_x, sparse_y], axis=1)
print(tf.sparse.to_dense(sparse_result, ''))

tf.Tensor(
[[b'John' b'' b'' b'fell' b'asleep']
 [b'a' b'big' b'dog' b'barked' b'']
 [b'my' b'cat' b'' b'is' b'fuzzy']], shape=(3, 5), dtype=string)

この区別が何故重要かのもう一つのサンプルとして、tf.reduce_mean のような op のための「各行の平均値」の定義を考えます。ragged tensor については、行の平均値は行の幅により除算された行の値の総計です。しかしスパース tensor については、行のための平均値はスパース tenosr の全体の幅 (これは最長の行の幅よりも大きいか等しいです) により除算された行の値の総計です。

TensorFlow API

Keras

tf.keras は深層学習モデルを構築して訓練するための TensorFlow の高位 API です。tf.keras.Input や tf.keras.layers.InputLayer 上で ragged=True を設定することにより ragged tensor は入力として Keras モデルに渡されるかもしれません。ragged tensor は Keras 層の間で渡されて、Keras モデルにより返されるかもしれません。次のサンプルは ragged tensor を使用して訓練される toy LSTM モデルを示します。

# Task: predict whether each sentence is a question or not.
sentences = tf.constant(
    ['What makes you think she is a witch?',
     'She turned me into a newt.',
     'A newt?',
     'Well, I got better.'])
is_question = tf.constant([True, False, True, False])

# Preprocess the input strings.
hash_buckets = 1000
words = tf.strings.split(sentences, ' ')
hashed_words = tf.strings.to_hash_bucket_fast(words, hash_buckets)

# Build the Keras model.
keras_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=[None], dtype=tf.int64, ragged=True),
    tf.keras.layers.Embedding(hash_buckets, 16),
    tf.keras.layers.LSTM(32, use_bias=False),
    tf.keras.layers.Dense(32),
    tf.keras.layers.Activation(tf.nn.relu),
    tf.keras.layers.Dense(1)
])

keras_model.compile(loss='binary_crossentropy', optimizer='rmsprop')
keras_model.fit(hashed_words, is_question, epochs=5)
print(keras_model.predict(hashed_words))

Epoch 1/5
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:437: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradient_tape/sequential/lstm/RaggedToTensor/boolean_mask_1/GatherV2:0", shape=(None,), dtype=int32), values=Tensor("gradient_tape/sequential/lstm/RaggedToTensor/boolean_mask/GatherV2:0", shape=(None, 16), dtype=float32), dense_shape=Tensor("gradient_tape/sequential/lstm/RaggedToTensor/Shape:0", shape=(2,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
1/1 [==============================] - 2s 2s/step - loss: 3.0093
Epoch 2/5
1/1 [==============================] - 0s 6ms/step - loss: 1.9501
Epoch 3/5
1/1 [==============================] - 0s 8ms/step - loss: 1.8567
Epoch 4/5
1/1 [==============================] - 0s 6ms/step - loss: 1.7799
Epoch 5/5
1/1 [==============================] - 0s 6ms/step - loss: 1.7253
[[0.03396465]
 [0.01080929]
 [0.03685516]
 [0.02207647]]

tf.Example

tf.Example は TensorFlow データのための標準的な protobuf エンコーディングです。tf.Example でエンコードされたデータはしばしば可変長特徴を含みます。例えば、次のコードは異なる特徴の長さを持つ 4 つの tf.Example メッセージのバッチを定義します :

import google.protobuf.text_format as pbtext

def build_tf_example(s):
  return pbtext.Merge(s, tf.train.Example()).SerializeToString()

example_batch = [
  build_tf_example(r'''
    features {
      feature {key: "colors" value {bytes_list {value: ["red", "blue"]} } }
      feature {key: "lengths" value {int64_list {value: [7]} } } }'''),
  build_tf_example(r'''
    features {
      feature {key: "colors" value {bytes_list {value: ["orange"]} } }
      feature {key: "lengths" value {int64_list {value: []} } } }'''),
  build_tf_example(r'''
    features {
      feature {key: "colors" value {bytes_list {value: ["black", "yellow"]} } }
      feature {key: "lengths" value {int64_list {value: [1, 3]} } } }'''),
  build_tf_example(r'''
    features {
      feature {key: "colors" value {bytes_list {value: ["green"]} } }
      feature {key: "lengths" value {int64_list {value: [3, 5, 2]} } } }''')]

tf.io.parse_example を使用してエンコードされたデータをパースできます、これはシリアライズされた文字列の tensor と特徴仕様辞書を取り、そして特徴名を tensor にマップする辞書を返します。可変長特徴を ragged tensor に読み込むため、単純に特徴仕様辞書で tf.io.RaggedFeature を使用します :

feature_specification = {
    'colors': tf.io.RaggedFeature(tf.string),
    'lengths': tf.io.RaggedFeature(tf.int64),
}
feature_tensors = tf.io.parse_example(example_batch, feature_specification)
for name, value in feature_tensors.items():
  print("{}={}".format(name, value))

colors=<tf.RaggedTensor [[b'red', b'blue'], [b'orange'], [b'black', b'yellow'], [b'green']]>
lengths=<tf.RaggedTensor [[7], [], [1, 3], [3, 5, 2]]>

tf.io.RaggedFeature はまた複数の ragged 次元を持つ特徴を読むために使用することもできます。詳細については、API ドキュメントを見てください。

データセット

tf.data は単純で、再利用可能なピースから複雑な入力パイプラインを構築することを可能にします。そのコアデータ構造は tf.data.Dataset で、これは要素のシークエンスを表し、そこでは各要素は一つまたはそれ以上の成分から成ります。

# Helper function used to print datasets in the examples below.
def print_dictionary_dataset(dataset):
  for i, element in enumerate(dataset):
    print("Element {}:".format(i))
    for (feature_name, feature_value) in element.items():
      print('{:>14} = {}'.format(feature_name, feature_value))

ragged tensor でデータセットを構築する

データセットは Dataset.from_tensor_slices のような、tf.Tensor や numpy 配列からそれらを構築するために使用される同じメソッドを使用して ragged tensor から構築できます :

dataset = tf.data.Dataset.from_tensor_slices(feature_tensors)
print_dictionary_dataset(dataset)

Element 0:
        colors = [b'red' b'blue']
       lengths = [7]
Element 1:
        colors = [b'orange']
       lengths = []
Element 2:
        colors = [b'black' b'yellow']
       lengths = [1 3]
Element 3:
        colors = [b'green']
       lengths = [3 5 2]

Note: Dataset.from_generator はまだ ragged tensor をサポートしませんが、サポートは間もなく追加されます。

ragged tensor を持つデータセットのバッチ化とアンバッチ (= unbatch) 化

ragged tensor を持つデータセットは Dataset.batch メソッドを使用してバッチ化できます (これは n 個の連続する要素を単一要素に連結します)。

batched_dataset = dataset.batch(2)
print_dictionary_dataset(batched_dataset)

Element 0:
        colors = <tf.RaggedTensor [[b'red', b'blue'], [b'orange']]>
       lengths = <tf.RaggedTensor [[7], []]>
Element 1:
        colors = <tf.RaggedTensor [[b'black', b'yellow'], [b'green']]>
       lengths = <tf.RaggedTensor [[1, 3], [3, 5, 2]]>

逆に、バッチ化されたデータセットは Dataset.unbatch を使用してフラットなデータセットに変換できます。

unbatched_dataset = batched_dataset.unbatch()
print_dictionary_dataset(unbatched_dataset)

Element 0:
        colors = [b'red' b'blue']
       lengths = [7]
Element 1:
        colors = [b'orange']
       lengths = []
Element 2:
        colors = [b'black' b'yellow']
       lengths = [1 3]
Element 3:
        colors = [b'green']
       lengths = [3 5 2]

可変長の非-ragged tensor を持つデータセットのバッチ化

非-ragged tensor を含むデータセットを持ち、tensor 長が要素に渡り様々である場合、dense_to_ragged_batch 変換を適用することによってそれらの非-ragged tensor を ragged tensor にバッチ化することができます :

non_ragged_dataset = tf.data.Dataset.from_tensor_slices([1, 5, 3, 2, 8])
non_ragged_dataset = non_ragged_dataset.map(tf.range)
batched_non_ragged_dataset = non_ragged_dataset.apply(
    tf.data.experimental.dense_to_ragged_batch(2))
for element in batched_non_ragged_dataset:
  print(element)

<tf.RaggedTensor [[0], [0, 1, 2, 3, 4]]>
<tf.RaggedTensor [[0, 1, 2], [0, 1]]>
<tf.RaggedTensor [[0, 1, 2, 3, 4, 5, 6, 7]]>

ragged tensor を持つデータセットを変換する

データセットの ragged tensor も Dataset.map を使用して作成または変換できます。

def transform_lengths(features):
  return {
      'mean_length': tf.math.reduce_mean(features['lengths']),
      'length_ranges': tf.ragged.range(features['lengths'])}
transformed_dataset = dataset.map(transform_lengths)
print_dictionary_dataset(transformed_dataset)

Element 0:
   mean_length = 7
 length_ranges = <tf.RaggedTensor [[0, 1, 2, 3, 4, 5, 6]]>
Element 1:
   mean_length = 0
 length_ranges = <tf.RaggedTensor []>
Element 2:
   mean_length = 2
 length_ranges = <tf.RaggedTensor [[0], [0, 1, 2]]>
Element 3:
   mean_length = 3
 length_ranges = <tf.RaggedTensor [[0, 1, 2], [0, 1, 2, 3, 4], [0, 1]]>

tf.function

tf.function は Python 関数のために TensorFlow グラフを事前計算するデコレータで、これは貴方の TensorFlow コードのパフォーマンスを大幅に改良できます。ragged tensor は @tf.function-デコレートされた関数で透過的に利用できます。例えば、次の関数は ragged と非-ragged tensor の両者で動作します :

@tf.function
def make_palindrome(x, axis):
  return tf.concat([x, tf.reverse(x, [axis])], axis)

make_palindrome(tf.constant([[1, 2], [3, 4], [5, 6]]), axis=1)

<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[1, 2, 2, 1],
       [3, 4, 4, 3],
       [5, 6, 6, 5]], dtype=int32)>

make_palindrome(tf.ragged.constant([[1, 2], [3], [4, 5, 6]]), axis=1)

<tf.RaggedTensor [[1, 2, 2, 1], [3, 3], [4, 5, 6, 6, 5, 4]]>

tf.function のための input_signature を明示的に指定したいのであれば、tf.RaggedTensorSpec を使用してそれを行なうことができます。

@tf.function(
    input_signature=[tf.RaggedTensorSpec(shape=[None, None], dtype=tf.int32)])
def max_and_min(rt):
  return (tf.math.reduce_max(rt, axis=-1), tf.math.reduce_min(rt, axis=-1))

max_and_min(tf.ragged.constant([[1, 2], [3], [4, 5, 6]]))

(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([2, 3, 6], dtype=int32)>,
 <tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 3, 4], dtype=int32)>)

Concrete 関数

concreate (具体的な) 関数は tf.function により構築された個々のトレースされたグラフをカプセル化します。TensorFlow 2.3 (そして in tf-nightly) から始まり、ragged tensor は concreate 関数で透過的に利用できます。

# Preferred way to use ragged tensors with concrete functions (TF 2.3+):
try:
  @tf.function
  def increment(x):
    return x + 1

  rt = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])
  cf = increment.get_concrete_function(rt)
  print(cf(rt))
except Exception as e:
  print(f"Not supported before TF 2.3: {type(e)}: {e}")

<tf.RaggedTensor [[2, 3], [4], [5, 6, 7]]>

TensorFlow 2.3 より前で concrete 関数で ragged tensor を使用する必要があれば、ragged tensor をそれらの成分 (値と row_splits) に分解してそれらを別個の引数として渡すことを勧めます。

# Backwards-compatible way to use ragged tensors with concrete functions:
@tf.function
def decomposed_ragged_increment(x_values, x_splits):
  x = tf.RaggedTensor.from_row_splits(x_values, x_splits)
  return x + 1

rt = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])
cf = decomposed_ragged_increment.get_concrete_function(rt.values, rt.row_splits)
print(cf(rt.values, rt.row_splits))

<tf.RaggedTensor [[2, 3], [4], [5, 6, 7]]>

SavedModel

SavedModel は重みと計算を含む、シリアライズ化された TensorFlow プログラムです。それは Keras モデルからあるいはカスタムモデルから構築できます。いずれのケースでも、SavedModel により定義された関数とメソッドで ragged tensor は透過的に利用できます。

サンプル: Keras モデルをセーブする

import tempfile

keras_module_path = tempfile.mkdtemp()
tf.saved_model.save(keras_model, keras_module_path)
imported_model = tf.saved_model.load(keras_module_path)
imported_model(hashed_words)

INFO:tensorflow:Assets written to: /tmp/tmpesn18col/assets
<tf.Tensor: shape=(4, 1), dtype=float32, numpy=
array([[0.0279716 ],
       [0.00884632],
       [0.02188365],
       [0.01005459]], dtype=float32)>

サンプル: カスタムモデルをセーブする

class CustomModule(tf.Module):
  def __init__(self, variable_value):
    super(CustomModule, self).__init__()
    self.v = tf.Variable(variable_value)

  @tf.function
  def grow(self, x):
    return x * self.v

module = CustomModule(100.0)

# Before saving a custom model, we must ensure that concrete functions are
# built for each input signature that we will need.
module.grow.get_concrete_function(tf.RaggedTensorSpec(shape=[None, None],
                                                      dtype=tf.float32))

custom_module_path = tempfile.mkdtemp()
tf.saved_model.save(module, custom_module_path)
imported_model = tf.saved_model.load(custom_module_path)
imported_model.grow(tf.ragged.constant([[1.0, 4.0, 3.0], [2.0]]))

INFO:tensorflow:Assets written to: /tmp/tmpkx0mdt27/assets
<tf.RaggedTensor [[100.0, 400.0, 300.0], [200.0]]>

以上

2021年1月
月	火	水	木	金	土	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31