ホーム » Keras » TensorFlow 2.0 Beta : Tutorials : ML 基本 :- Keras と TensorFlow Hub による映画レビューのテキスト分類

TensorFlow 2.0 Beta : Tutorials : ML 基本 :- Keras と TensorFlow Hub による映画レビューのテキスト分類

TensorFlow 2.0 Beta : Beginner Tutorials : ML 基本 :- Keras と TensorFlow Hub による映画レビューのテキスト分類 (翻訳/解説)

翻訳 : (株)クラスキャット セールスインフォメーション
作成日時 : 06/21/2019

* 本ページは、TensorFlow の本家サイトの TF 2.0 Beta – Beginner Tutorials – ML basics の以下のページを翻訳した上で
適宜、補足説明したものです:

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

 

Tutorials : ML 基本 :- Keras と TensorFlow Hub による映画レビューのテキスト分類

このノートブックは (映画) レビューのテキストを使用して映画レビューをポジティブ (肯定的) かネガティブ (否定的) として分類します。これは二値 — あるいは 2 クラス — 分類の例で、重要で広く利用可能な種類の機械学習問題です。

このチュートリアルは TensorFlow Hub と Keras による転移学習の基本的なアプリケーションを実演します。

私達は IMDB データセット を使用します、これは Internet Movie Database からの 50,000 映画レビューのテキストを含みます。これらは訓練のための 25,000 レビューとテストのための 25,000 レビューに分割されます。訓練とテストセットは均等です、つまりそれらがポジティブとネガティブ・レビューの同じ数を含むことを意味します。

この notebook は tf.keras、TensorFlow でモデルを構築して訓練するための高位 API、そして TensorFlow Hub、転移学習のためのライブラリとプラットフォームを使用します。tf.keras を使用したより進んだテキスト分類チュートリアルについては、MLCC テキスト分類ガイド を見てください。

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np

!pip install -q tensorflow==2.0.0-beta1
import tensorflow as tf

import tensorflow_hub as hub
import tensorflow_datasets as tfds

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.test.is_gpu_available() else "NOT AVAILABLE")
Version:  2.0.0-beta1
Eager mode:  True
Hub version:  0.4.0
GPU is NOT AVAILABLE

 

IMDB データセットをダウンロードする

IMDB データセットは TensorFlow データセット 上で利用可能です。次のコードは IMDB データセットを貴方のマシン (または colab ランタイム) にダウンロードします :

# Split the training set into 60% and 40%, so we'll end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_validation_split = tfds.Split.TRAIN.subsplit([6, 4])

(train_data, validation_data), test_data = tfds.load(
    name="imdb_reviews", 
    split=(train_validation_split, tfds.Split.TEST),
    as_supervised=True)
Downloading and preparing dataset imdb_reviews (80.23 MiB) to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/0.1.0...

HBox(children=(IntProgress(value=1, bar_style='info', description='Dl Completed...', max=1, style=ProgressStyl…
HBox(children=(IntProgress(value=1, bar_style='info', description='Dl Size...', max=1, style=ProgressStyle(des…




HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


HBox(children=(IntProgress(value=0, description='Shuffling...', max=10, style=ProgressStyle(description_width=…
WARNING: Logging before flag parsing goes to stderr.
W0614 17:32:56.426884 139846937216768 deprecation.py:323] From /home/kbuilder/.local/lib/python3.5/site-packages/tensorflow_datasets/core/file_format_adapter.py:209: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


HBox(children=(IntProgress(value=0, description='Shuffling...', max=10, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


HBox(children=(IntProgress(value=0, description='Shuffling...', max=20, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=2500, style=ProgressStyle(description_width=…
Dataset imdb_reviews downloaded and prepared to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/0.1.0. Subsequent calls will reuse this data.

 

データを調査する

データのフォーマットを理解するために少し時間をつかいましょう。各サンプルは映画レビューを表わすセンテンスと対応するラベルです。センテンスはどのような方法でも前処理されていません。各ラベルは 0 か 1 の整数値で、そこでは 0 は否定的なレビューで、1 は肯定的なレビューです。

最初の 10 サンプルをプリントしましょう。

train_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))
train_examples_batch
<tf.Tensor: id=220, shape=(10,), dtype=string, numpy=
array([b"As a lifelong fan of Dickens, I have invariably been disappointed by adaptations of his novels.

Although his works presented an extremely accurate re-telling of human life at every level in Victorian Britain, throughout them all was a pervasive thread of humour that could be both playful or sarcastic as the narrative dictated. In a way, he was a literary caricaturist and cartoonist. He could be serious and hilarious in the same sentence. He pricked pride, lampooned arrogance, celebrated modesty, and empathised with loneliness and poverty. It may be a clich\xc3\xa9, but he was a people's writer.

And it is the comedy that is so often missing from his interpretations. At the time of writing, Oliver Twist is being dramatised in serial form on BBC television. All of the misery and cruelty is their, but non of the humour, irony, and savage lampoonery. The result is just a dark, dismal experience: the story penned by a journalist rather than a novelist. It's not really Dickens at all.

'Oliver!', on the other hand, is much closer to the mark. The mockery of officialdom is perfectly interpreted, from the blustering beadle to the drunken magistrate. The classic stand-off between the beadle and Mr Brownlow, in which the law is described as 'a ass, a idiot' couldn't have been better done. Harry Secombe is an ideal choice.

But the blinding cruelty is also there, the callous indifference of the state, the cold, hunger, poverty and loneliness are all presented just as surely as The Master would have wished.

And then there is crime. Ron Moody is a treasure as the sleazy Jewish fence, whilst Oliver Reid has Bill Sykes to perfection.

Perhaps not surprisingly, Lionel Bart - himself a Jew from London's east-end - takes a liberty with Fagin by re-interpreting him as a much more benign fellow than was Dicken's original. In the novel, he was utterly ruthless, sending some of his own boys to the gallows in order to protect himself (though he was also caught and hanged). Whereas in the movie, he is presented as something of a wayward father-figure, a sort of charitable thief rather than a corrupter of children, the latter being a long-standing anti-semitic sentiment. Otherwise, very few liberties are taken with Dickens's original. All of the most memorable elements are included. Just enough menace and violence is retained to ensure narrative fidelity whilst at the same time allowing for children' sensibilities. Nancy is still beaten to death, Bullseye narrowly escapes drowning, and Bill Sykes gets a faithfully graphic come-uppance.

Every song is excellent, though they do incline towards schmaltz. Mark Lester mimes his wonderfully. Both his and my favourite scene is the one in which the world comes alive to 'who will buy'. It's schmaltzy, but it's Dickens through and through.

I could go on. I could commend the wonderful set-pieces, the contrast of the rich and poor. There is top-quality acting from more British regulars than you could shake a stick at.

I ought to give it 10 points, but I'm feeling more like Scrooge today. Soak it up with your Christmas dinner. No original has been better realised.", b"Oh yeah! Jenna Jameson did it again! Yeah Baby! This movie rocks. It was one of the 1st movies i saw of her. And i have to say i feel in love with her, she was great in this move.

Her performance was outstanding and what i liked the most was the scenery and the wardrobe it was amazing you can tell that they put a lot into the movie the girls cloth were amazing.

I hope this comment helps and u can buy the movie, the storyline is awesome is very unique and i'm sure u are going to like it. Jenna amazed us once more and no wonder the movie won so many awards. Her make-up and wardrobe is very very sexy and the girls on girls scene is amazing. specially the one where she looks like an angel. It's a must see and i hope u share my interests", b"I saw this film on True Movies (which automatically made me sceptical) but actually - it was good. Why? Not because of the amazing plot twists or breathtaking dialogue (of which there is little) but because actually, despite what people say I thought the film was accurate in it's depiction of teenagers dealing with pregnancy.

It's NOT Dawson's Creek, they're not graceful, cool witty characters who breeze through sexuality with effortless knowledge. They're kids and they act like kids would.

They're blunt, awkward and annoyingly confused about everything. Yes, this could be by accident and they could just be bad actors but I don't think so. Dermot Mulroney gives (when not trying to be cool) a very believable performance and I loved him for it. Patricia Arquette IS whiny and annoying, but she was pregnant and a teenagers? The combination of the two isn't exactly lavender on your pillow. The plot was VERY predictable and but so what? I believed them, his stress and inability to cope - her brave, yet slightly misguided attempts to bring them closer together. I think the characters, acted by anyone else, WOULD indeed have been annoying and unbelievable but they weren't. It reflects the surreality of the situation they're in, that he's sitting in class and she walks on campus with the baby. I felt angry at her for that, I felt angry at him for being such a child and for blaming her. I felt it all.

In the end, I loved it and would recommend it.

Watch out for the scene where Dermot Mulroney runs from the disastrous counselling session - career performance.", b'This was a wonderfully clever and entertaining movie that I shall never tire of watching many, many times. The casting was magnificent in matching up the young with the older characters. There are those of us out here who really do appreciate good actors and an intelligent story format. As for Judi Dench, she is beautiful and a gift to any kind of production in which she stars. I always make a point to see Judi Dench in all her performances. She is a superb actress and a pleasure to watch as each transformation of her character comes to life. I can only be grateful when I see such an outstanding picture for most of the motion pictures made more recently lack good characters, good scripts and good acting. The movie public needs heroes, not deviant manikins, who lack ingenuity and talent. How wonderful to see old favorites like Leslie Caron, Olympia Dukakis and Cleo Laine. I would like to see this movie win the awards it deserves. Thank you again for a tremendous night of entertainment. I congratulate the writer, director, producer, and all those who did such a fine job.', b'I have no idea what the other reviewer is talking about- this was a wonderful movie, and created a sense of the era that feels like time travel. The characters are truly young, Mary is a strong match for Byron, Claire is juvenile and a tad annoying, Polidori is a convincing beaten-down sycophant... all are beautiful, curious, and decadent... not the frightening wrecks they are in Gothic.

Gothic works as an independent piece of shock film, and I loved it for different reasons, but this works like a Merchant and Ivory film, and was from my readings the best capture of what the summer must have felt like. Romantic, yes, but completely rekindles my interest in the lives of Shelley and Byron every time I think about the film. One of my all-time favorites.', b"This was soul-provoking! I am an Iranian, and living in th 21st century, I didn't know that such big tribes have been living in such conditions at the time of my grandfather!

You see that today, or even in 1925, on one side of the world a lady or a baby could have everything served for him or her clean and on-demand, but here 80 years ago, people ventured their life to go to somewhere with more grass. It's really interesting that these Persians bear those difficulties to find pasture for their sheep, but they lose many the sheep on their way.

I praise the Americans who accompanied this tribe, they were as tough as Bakhtiari people.", b'Just because someone is under the age of 10 does not mean they are stupid. If your child likes this film you\'d better have him/her tested. I am continually amazed at how so many people can be involved in something that turns out so bad. This "film" is a showcase for digital wizardry AND NOTHING ELSE. The writing is horrid. I can\'t remember when I\'ve heard such bad dialogue. The songs are beyond wretched. The acting is sub-par but then the actors were not given much. Who decided to employ Joey Fatone? He cannot sing and he is ugly as sin.

The worst thing is the obviousness of it all. It is as if the writers went out of their way to make it all as stupid as possible. Great children\'s movies are wicked, smart and full of wit - films like Shrek and Toy Story in recent years, Willie Wonka and The Witches to mention two of the past. But in the continual dumbing-down of American more are flocking to dreck like Finding Nemo (yes, that\'s right), the recent Charlie & The Chocolate Factory and eye-crossing trash like Red Riding Hood.', b"I absolutely LOVED this movie when I was a kid. I cried every time I watched it. It wasn't weird to me. I totally identified with the characters. I would love to see it again (and hope I wont be disappointed!). Pufnstuf rocks!!!! I was really drawn in to the fantasy world. And to me the movie was loooong. I wonder if I ever saw the series and have confused them? The acting I thought was strong. I loved Jack Wilde. He was so dreamy to an 10 year old (when I first saw the movie, not in 1970. I can still remember the characters vividly. The flute was totally believable and I can still 'feel' the evil woods. Witchy poo was scary - I wouldn't want to cross her path.", b'A very close and sharp discription of the bubbling and dynamic emotional world of specialy one 18year old guy, that makes his first experiences in his gay love to an other boy, during an vacation with a part of his family.

I liked this film because of his extremly clear and surrogated storytelling , with all this "Sound-close-ups" and quiet moments wich had been full of intensive moods.

', b"This is the most depressing film I have ever seen. I first saw it as a child and even thinking about it now really upsets me. I know it was set in a time when life was hard and I know these people were poor and the crops were vital. Yes, I get all that. What I find hard to take is I can't remember one single light moment in the entire film. Maybe it was true to life, I don't know. I'm quite sure the acting was top notch and the direction and quality of filming etc etc was wonderful and I know that every film can't have a happy ending but as a family film it is dire in my opinion.

I wouldn't recommend it to anyone who wants to be entertained by a film. I can't stress enough how this film affected me as a child. I was talking about it recently and all the sad memories came flooding back. I think it would have all but the heartless reaching for the Prozac."], dtype=object)>

 
最初の 10 ラベルもプリントしましょう。

train_labels_batch
<tf.Tensor: id=221, shape=(10,), dtype=int64, numpy=array([1, 1, 1, 1, 1, 1, 0, 1, 1, 0])>

 

モデルを構築する

ニューラルネットワークは層をスタックすることにより作成されます — これは 3 つの主要なアーキテクチャ的な決定を必要とします :

  • テキストをどのように表現するか?
  • モデルで幾つの層を使用するか?
  • 各層のために幾つの隠れユニットを使用するか?

このサンプルでは、入力データはセンテンスから成ります。予測するラベルは 0 か 1 です。

テキストを表わす一つの方法はセンテンスを埋め込みベクトルに変換することです。最初の層として事前訓練されたテキスト埋め込みを使用できます、これは 3 つの優位点を持ちます : * テキスト前処理について心配しなくてかまいません、* 転移学習からの恩恵を受けることができます、* 埋め込みは固定サイズを持ちますので、処理がより容易です。

このサンプルのために TensorFlow Hub から google/tf2-preview/gnews-swivel-20dim/1 と呼称される事前訓練されてテキスト埋め込みモデルを使用します。

このチュートリアルをテストするために 3 つの他の事前訓練されたモデルがあります :

最初にセンテンスを埋め込むために TensorFlow Hub モデルを使用する Keras 層を作成しましょう、そして 2, 3 の入力サンプル上でそれを試しましょう。入力テキストの長さがどうであれ、埋め込みの出力 shape は : (num_examples, embedding_dimension) であることに注意してください。

embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])
<tf.Tensor: id=402, shape=(3, 20), dtype=float32, numpy=
array([[ 3.9819887 , -4.4838037 ,  5.177359  , -2.3643482 , -3.2938678 ,
        -3.5364532 , -2.4786978 ,  2.5525482 ,  6.688532  , -2.3076782 ,
        -1.9807833 ,  1.1315885 , -3.0339816 , -0.7604128 , -5.743445  ,
         3.4242578 ,  4.790099  , -4.03061   , -5.992149  , -1.7297493 ],
       [ 3.4232912 , -4.230874  ,  4.1488533 , -0.29553518, -6.802391  ,
        -2.5163853 , -4.4002395 ,  1.905792  ,  4.7512794 , -0.40538004,
        -4.3401685 ,  1.0361497 ,  0.9744097 ,  0.71507156, -6.2657013 ,
         0.16533905,  4.560262  , -1.3106939 , -3.1121316 , -2.1338716 ],
       [ 3.8508697 , -5.003031  ,  4.8700504 , -0.04324996, -5.893603  ,
        -5.2983093 , -4.004676  ,  4.1236343 ,  6.267754  ,  0.11632943,
        -3.5934832 ,  0.8023905 ,  0.56146765,  0.9192484 , -7.3066816 ,
         2.8202746 ,  6.2000837 , -3.5709393 , -4.564525  , -2.305622  ]],
      dtype=float32)>

さて完全なモデルを構築しましょう :

model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
keras_layer (KerasLayer)     (None, 20)                400020    
_________________________________________________________________
dense (Dense)                (None, 16)                336       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
=================================================================
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________

分類器を構築するために層はシーケンシャルにスタックされます :

  1. 最初の層は TensorFlow Hub 層です。この層はセンテンスをその埋め込みベクトルにマップするために事前訓練された Saved Model を使用します。私達が使用している事前訓練されたテキスト埋め込みモデル (google/tf2-preview/gnews-swivel-20dim/1) はセンテンスをトークンに分割し、各トークンを埋め込んでから埋め込みを結合します。結果の次元は : (num_examples, embedding_dimension) です。
  2. この固定長出力ベクトルは 16 隠れユニットを持つ完全結合 (Dense) 層を通してパイプされます。
  3. 最後の層は単一の出力ノードに密に接続されています。sigmoid 活性化関数を使用します、この値は 0 と 1 の間の浮動小数点で、確率、または確信レベルを表します。

モデルを compile しましょう。

 

損失関数と optimizer

モデルは訓練のために損失関数と optimizer を必要とします。これは二値分類問題でモデルは確率を出力します (sigmoid 活性を持つシングルユニット層) ので、binary_crossentropy 損失関数を使用します。

これは損失関数のための唯一の選択ではありません、例えば、mean_squared_error を選択できるでしょう。しかし、一般に、binary_crossentropy は確率を扱うためにはより良いです — それは確率分布間、あるいは私達のケースでは、正解の分布と予測の間の「距離」を測ります。

後で、回帰問題 (例えば、家の価格を予測する) を調べているときに、mean squared error と呼ばれるもう一つの損失関数をどのように使用するかを見ます。

さて、optimizer と損失関数を使用するためにモデルを configure します :

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

 

モデルを訓練する

モデルを 512 サンプルのミニバッチで 20 エポック訓練します。これは x_train と y_train tensor の総てのサンプルに渡る 20 iteration です。訓練の間、検証セットからの 10,000 サンプル上でモデルの損失と精度を監視します :

history = model.fit(train_data.shuffle(10000).batch(512),
                    epochs=20,
                    validation_data=validation_data.batch(512),
                    verbose=1)
Epoch 1/20

W0614 17:33:39.907767 139846937216768 deprecation.py:323] From /tmpfs/src/tf_docs_env/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

30/30 [==============================] - 6s 198ms/step - loss: 0.7173 - accuracy: 0.5801 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/20
30/30 [==============================] - 5s 156ms/step - loss: 0.6357 - accuracy: 0.6491 - val_loss: 0.6120 - val_accuracy: 0.6715
Epoch 3/20
30/30 [==============================] - 5s 161ms/step - loss: 0.5855 - accuracy: 0.6984 - val_loss: 0.5707 - val_accuracy: 0.7074
Epoch 4/20
30/30 [==============================] - 5s 158ms/step - loss: 0.5429 - accuracy: 0.7370 - val_loss: 0.5331 - val_accuracy: 0.7393
Epoch 5/20
30/30 [==============================] - 5s 159ms/step - loss: 0.5020 - accuracy: 0.7705 - val_loss: 0.4968 - val_accuracy: 0.7659
Epoch 6/20
30/30 [==============================] - 5s 151ms/step - loss: 0.4619 - accuracy: 0.7973 - val_loss: 0.4627 - val_accuracy: 0.7897
Epoch 7/20
30/30 [==============================] - 5s 157ms/step - loss: 0.4237 - accuracy: 0.8209 - val_loss: 0.4314 - val_accuracy: 0.8082
Epoch 8/20
30/30 [==============================] - 5s 155ms/step - loss: 0.3880 - accuracy: 0.8404 - val_loss: 0.4040 - val_accuracy: 0.8227
Epoch 9/20
30/30 [==============================] - 5s 157ms/step - loss: 0.3556 - accuracy: 0.8558 - val_loss: 0.3806 - val_accuracy: 0.8337
Epoch 10/20
30/30 [==============================] - 5s 160ms/step - loss: 0.3268 - accuracy: 0.8725 - val_loss: 0.3612 - val_accuracy: 0.8425
Epoch 11/20
30/30 [==============================] - 5s 156ms/step - loss: 0.3011 - accuracy: 0.8844 - val_loss: 0.3451 - val_accuracy: 0.8493
Epoch 12/20
30/30 [==============================] - 5s 155ms/step - loss: 0.2783 - accuracy: 0.8963 - val_loss: 0.3321 - val_accuracy: 0.8553
Epoch 13/20
30/30 [==============================] - 5s 155ms/step - loss: 0.2577 - accuracy: 0.9033 - val_loss: 0.3215 - val_accuracy: 0.8589
Epoch 14/20
30/30 [==============================] - 5s 159ms/step - loss: 0.2391 - accuracy: 0.9116 - val_loss: 0.3132 - val_accuracy: 0.8622
Epoch 15/20
30/30 [==============================] - 5s 153ms/step - loss: 0.2220 - accuracy: 0.9189 - val_loss: 0.3066 - val_accuracy: 0.8664
Epoch 16/20
30/30 [==============================] - 5s 166ms/step - loss: 0.2063 - accuracy: 0.9257 - val_loss: 0.3016 - val_accuracy: 0.8699
Epoch 17/20
30/30 [==============================] - 5s 160ms/step - loss: 0.1919 - accuracy: 0.9319 - val_loss: 0.2977 - val_accuracy: 0.8721
Epoch 18/20
30/30 [==============================] - 5s 156ms/step - loss: 0.1786 - accuracy: 0.9381 - val_loss: 0.2950 - val_accuracy: 0.8742
Epoch 19/20
30/30 [==============================] - 5s 156ms/step - loss: 0.1662 - accuracy: 0.9433 - val_loss: 0.2939 - val_accuracy: 0.8759
Epoch 20/20
30/30 [==============================] - 5s 160ms/step - loss: 0.1536 - accuracy: 0.9484 - val_loss: 0.2944 - val_accuracy: 0.8765

 

モデルを評価する

そしてモデルどのように遂行するか見ましょう。2 つの値が返されます。損失 (エラーを表わす数字です、より低ければより良いです)、そして精度です。

results = model.evaluate(test_data.batch(512), verbose=0)
for name, value in zip(model.metrics_names, results):
  print("%s: %.3f" % (name, value))
loss: 0.316
accuracy: 0.866

このかなり素朴なアプローチは約 87 % の精度を得ます。より進んだアプローチでは、モデルは 95 % に近づくはずです。

 

以上



AI導入支援 #2 ウェビナー

スモールスタートを可能としたAI導入支援   Vol.2
[無料 WEB セミナー] [詳細]
「画像認識 AI PoC スターターパック」の紹介
既に AI 技術を実ビジネスで活用し、成果を上げている日本企業も多く存在しており、競争優位なビジネスを展開しております。
しかしながら AI を導入したくとも PoC (概念実証) だけでも高額な費用がかかり取組めていない企業も少なくないようです。A I導入時には欠かせない PoC を手軽にしかも短期間で認知度を確認可能とするサービの紹介と共に、AI 技術の特性と具体的な導入プロセスに加え運用時のポイントについても解説いたします。
日時:2021年10月13日(水)
会場:WEBセミナー
共催:クラスキャット、日本FLOW(株)
後援:働き方改革推進コンソーシアム
参加費: 無料 (事前登録制)
人工知能開発支援
◆ クラスキャットは 人工知能研究開発支援 サービスを提供しています :
  • テクニカルコンサルティングサービス
  • 実証実験 (プロトタイプ構築)
  • アプリケーションへの実装
  • 人工知能研修サービス
◆ お問合せ先 ◆
(株)クラスキャット
セールス・インフォメーション
E-Mail:sales-info@classcat.com