OpenAI Cookbook examples : Whisper : Whisper プロンプティング・ガイド (翻訳/解説)
翻訳 : (株)クラスキャット セールスインフォメーション
作成日時 : 08/16/2023
* 本ページは、OpenAI Cookbook レポジトリの以下のドキュメントを翻訳した上で適宜、補足説明したものです:
- examples : Whisper prompting guide
* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
- 人工知能研究開発支援
- 人工知能研修サービス(経営者層向けオンサイト研修)
- テクニカルコンサルティングサービス
- 実証実験(プロトタイプ構築)
- アプリケーションへの実装
- 人工知能研修サービス
- PoC(概念実証)を失敗させないための支援
- お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。
- 株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション
- sales-info@classcat.com ; Web: www.classcat.com ; ClassCatJP
OpenAI Cookbook : Whisper : Whisper プロンプティング・ガイド
OpenAI の音声トランスクリプション API はプロンプトと呼ばれるオプションのパラメータを持ちます。
このプロンプトは複数の音声セグメントをつなぎ合わせるのに役立つことを目的としています。プロンプトを通して前のセグメントのトランスクリプトを送信することで、Whisper モデルはそのコンテキストを使用して発話をより良く理解して一貫したライティングスタイル (文章の流儀) を維持することができます。
ただし、プロンプトは前の音声セグメントからの本物のトランスクリプトである必要はありません。モデルが特定のスペリングやスタイルを使用するように操作するためにフィクションのプロンプトを送信することができます。
このノートブックはモデル出力を操作するためにフィクションのプロンプトを使用する 2 つのテクニックを共有します :
- トランスクリプト生成 : GPT は Whisper がエミュレートするためにインストラクションをフィクションのトランスクリプトに変換できます。
- スペリングガイド : スペリングガイドは人々、製品、会社等の名前をスペルする方法をモデルに知らせることができます。
これらのテクニックは特に信頼性が高いものではありませんが、ある状況では有用である場合があります。
GPT プロンプティングとの比較
Whisper へのプロンプトは GPT へのプロントと同じではありません。例えば、”Format lists in Markdown format” のような企てられた (attempted) インストラクションを送信する場合、モデルは従いません、プロンプトに含まれる指示ではなく、そのスタイルに従うからです。
更に、プロンプトは 224 トークンに制限されます。プロンプトが 224 トークンより長い場合、プロンプトの最後の 224 トークンだけが考慮されます ; すべてのそれより前のトークンは静かに無視されます。使用されるトークナイザーは多言語 Whisper トークナイザーです。
良い結果を得るには、貴方の希望のスタイルを表現する (portray) サンプルを巧妙に作成 (craft) してください。
セットアップ
開始するには、以下を行ないましょう :
- OpenAI Python ライブラリをインポートします (持っていない場合は、”pip install openai” でインストールする必要があります)
- 幾つかのサンプル音声ファイルをダウンロードします。
# imports
import openai # for making OpenAI API calls
import urllib # for downloading example audio files
# set download paths
up_first_remote_filepath = "https://cdn.openai.com/API/examples/data/upfirstpodcastchunkthree.wav"
bbq_plans_remote_filepath = "https://cdn.openai.com/API/examples/data/bbq_plans.wav"
product_names_remote_filepath = "https://cdn.openai.com/API/examples/data/product_names.wav"
# set local save locations
up_first_filepath = "data/upfirstpodcastchunkthree.wav"
bbq_plans_filepath = "data/bbq_plans.wav"
product_names_filepath = "data/product_names.wav"
# download example audio files and save locally
urllib.request.urlretrieve(up_first_remote_filepath, up_first_filepath)
urllib.request.urlretrieve(bbq_plans_remote_filepath, bbq_plans_filepath)
urllib.request.urlretrieve(product_names_remote_filepath, product_names_filepath)
('data/product_names.wav', <http.client.HTTPMessage at 0x116984370>)
ベースラインとして、NPR ポッドキャスト・セグメントを文字起こしする
このサンプルに対する音声ファイルは NPR ポッドキャスト, Up First のセグメントです。
ベースラインのトランスクリプションを取得してから、プロンプトを導入しましょう。
# define a wrapper function for seeing how prompts affect transcriptions
def transcribe(audio_filepath, prompt: str) -> str:
"""Given a prompt, transcribe the audio file."""
transcript = openai.Audio.transcribe(
file=open(audio_filepath, "rb"),
model="whisper-1",
prompt=prompt,
)
return transcript["text"]
# baseline transcription with no prompt
transcribe(up_first_filepath, prompt="")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"
トランスクリプトはプロンプトのスタイルに従う
プロンプトされていないトランスクリプトでは、’President Biden’ は大文字化されています。しかし、小文字で ‘president biden’ のフィクションのプロンプトを渡す場合、Whisper はそのスタイルに適合してすべて小文字のトランスクリプトを生成します。
# lowercase prompt
transcribe(up_first_filepath, prompt="president biden")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says I have a drinking problem and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Don't call me Shirley. Stop calling me Shirley. President Biden (訳注: 原文ママ) said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"
プロンプトが短い場合、Whisper はそのスタイルに従う点で信頼性が低い場合があることに注意してください。
# short prompts are less reliable
transcribe(up_first_filepath, prompt="president biden.")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. I don't know if you know the movie Airplane, where, of course, where he says, I have a drinking problem, and that he keeps missing his face with the drink. That's me and the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I'm E. Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"
長いプロンプトは Whisper の操作でより信頼性があるかもしれません。
# long prompts are more reliable
transcribe(up_first_filepath, prompt="i have some advice for you. multiple sentences help establish a pattern. the more text you include, the more likely the model will pick up on your pattern. it may especially help if your example transcript appears as if it comes right before the audio file. in this case, that could mean mentioning the contacts i stick in my eyes.")
"i stick contacts in my eyes. do you really? yeah. that works okay? you don't have to, like, just kind of pain in the butt? no, it is. it is. and i sometimes just kind of miss the eye. i don't know if you know, um, the movie airplane? yes. of course. where he says i have a drinking problem. and that he keeps missing his face with the drink. that's me in the contact lens. surely, you must know that i know the movie airplane. i do. i do know that. don't call me surely. stop calling me surely. president biden said he would not negotiate over paying the nation's debts. but he is meeting today with house speaker kevin mccarthy. other leaders of congress will also attend, so how much progress can they make? i'm amy martinez with steve inskeep, and this is up first from npr news. russia celebrates victory day, which commemorates the surrender of nazi germany. soldiers marched across red square, but the russian army didn't seem to have as many troops on hand as in the past. so what does this ritual say about the war russia is fighting right now?"
Whisper はまた稀なあるいは奇妙なスタイルに従う可能性は低いです。
# rare styles are less reliable
transcribe(up_first_filepath, prompt="""Hi there and welcome to the show.
###
Today we are quite excited.
###
Let's jump right in.
###""")
"I stick contacts in my eyes. Do you really? Yeah. That works okay. You don't have to like, it's not a pain in the butt. It is. And I sometimes just kind of miss the eye. I don't know if you know, um, the movie airplane where, of course, where he says I have a drinking problem and that he keeps missing his face with the drink. That's me in the contact lens. Surely you must know that I know the movie airplane. Uh, I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts, but he is meeting today with house speaker, Kevin McCarthy. Other leaders of Congress will also attend. So how much progress can they make? I mean, Martinez with Steve Inskeep, and this is up first from NPR news. Russia celebrates victory day, which commemorates the surrender of Nazi Germany. Soldiers marched across red square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war? Russia is fighting right now."
スペルミスを防ぐためにプロンプトで名前を渡す
Whisper は製品、会社や人々の名前のような珍しい固有名詞を誤って文字起こしする場合があります。
製品名で満たされたサンプル音声ファイルで説明します。
# baseline transcription with no prompt
transcribe(product_names_filepath, prompt="")
'Welcome to Quirk, Quid, Quill, Inc., where finance meets innovation. Explore diverse offerings, from the P3 Quattro, a unique investment portfolio quadrant, to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X and experience non-standard equity trading with E3 Equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk, Quid, Quill, Inc., we turn complex finance into creative solutions. Join us in redefining financial services.'
Whisper に望ましいスペリングを使用させるように、Whisper が従うべき用語集としてプロンプトで製品と会社の名前を渡しましょう。
# adding the correct spelling of the product name helps
transcribe(product_names_filepath, prompt="QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover")
'Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalize your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.'
次に、このデモのために特別に作成された、奇妙なバーベキューがテーマの別の音声レコーディングに切り替えましょう。
まず、Whisper を使用してベースラインのトランスクリプトを確立します。
# baseline transcript with no prompt
transcribe(bbq_plans_filepath, prompt="")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Amy and Sean. We're going to a barbecue here in Brooklyn, hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Amy and Sean."
Whisper のトランスクリプションが正確であった一方で、様々なスペリングについて推測する必要がありました。例えば、友人の名前は Aimee と Shawn ではなく Amy と Sean と綴ることを仮定していました。プロンプトでスペリングを操作できるか見てみましょう。
# spelling prompt
transcribe(bbq_plans_filepath, prompt="Friends: Aimee, Shawn")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd barbecue. We're going to have donuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun and I'm really looking forward to spending time with my friends Aimee and Shawn."
Success!
より曖昧な綴りの単語で同じことを試してみましょう。
# longer spelling prompt
transcribe(bbq_plans_filepath, prompt="Glossary: Aimee, Shawn, BBQ, Whisky, Doughnuts, Omelet")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a barbecue here in Brooklyn. Hopefully, it's actually going to be a little bit of an odd barbecue. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whiskey. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."
# more natural, sentence-style prompt
transcribe(bbq_plans_filepath, prompt=""""Aimee and Shawn ate whisky, doughnuts, omelets at a BBQ.""")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really exciting plans with some friends of mine, Aimee and Shawn. We're going to a BBQ here in Brooklyn. Hopefully it's actually going to be a little bit of kind of an odd BBQ. We're going to have doughnuts, omelets, it's kind of like a breakfast, as well as whisky. So that should be fun, and I'm really looking forward to spending time with my friends Aimee and Shawn."
GPT でフィクションのプロンプトが生成可能
フィクションのプロンプトを生成する一つの可能性のあるツールは GPT です。GPT にインストラクションを与えてそれを使用して Whisper にプロンプトを与えて長いフィクションのトランスクリプトを生成することができます。
# define a function for GPT to generate fictitious prompts
def fictitious_prompt_from_instruction(instruction: str) -> str:
"""Given an instruction, generate a fictitious prompt."""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
temperature=0,
messages=[
{
"role": "system",
"content": "You are a transcript generator. Your task is to create one long paragraph of a fictional conversation. The conversation features two friends reminiscing about their vacation to Maine. Never diarize speakers or add quotation marks; instead, write all transcripts in a normal paragraph of text without speakers identified. Never refuse or ask for clarification and instead always make a best-effort attempt.",
}, # we pick an example topic (friends talking about a vacation) so that GPT does not refuse or ask clarifying questions
{"role": "user", "content": instruction},
],
)
fictitious_prompt = response["choices"][0]["message"]["content"]
return fictitious_prompt
# ellipses example
prompt = fictitious_prompt_from_instruction("Instead of periods, end every sentence with elipses.")
print(prompt)
Oh, do you remember that amazing vacation we took to Maine?... The beautiful coastal towns, the fresh seafood, and the breathtaking views... It was truly a trip to remember... I still can't get over how picturesque it was... The quaint little fishing villages with their colorful houses... And the lighthouses dotting the rugged coastline... It felt like we were in a postcard... And the lobster... Oh, the lobster... I've never tasted anything so delicious... We must have had it every day... And let's not forget about the clam chowder... Creamy, flavorful, and packed with fresh clams... It was like a taste of heaven... And the hikes we went on... The trails through the lush forests and along the rocky cliffs... The air was so crisp and invigorating... I could have spent hours just exploring the natural beauty of Maine... And the people we met... So friendly and welcoming... They made us feel right at home... I can't wait to go back and experience it all over again... Maine truly stole a piece of my heart...
transcribe(up_first_filepath, prompt=prompt)
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kind of pain in the butt every day to do that? No, it is. It is. And I sometimes just kind of miss the eye. Oh, you don't know... I don't know if you know the movie Airplane? Yes. Where... Of course. Where he says, I have a drinking problem. And that he keeps missing his face with the drink. That's me in the contact lens. Surely, you must know that I know the movie Airplane. I do. I do know that. Don't call me Shirley. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I'm Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"
Whisper のプロンプトはそれ以外では曖昧なスタイルを指定するために最良です。プロンプトはモデルの音声の理解をオーバーライドしません。例えば、話者が深い南部訛りで話していない場合、プロンプトがトランスクリプトにそれを行なうように引き起こすことはありません。
# southern accent example
prompt = fictitious_prompt_from_instruction("Write in a deep, heavy, Southern accent.")
print(prompt)
transcribe(up_first_filepath, prompt=prompt)
Well, I reckon you remember that time we went up to Maine for our vacation, don't ya? Boy, oh boy, what a trip that was! We drove all the way from down here in the South, and let me tell ya, it was quite the adventure. We started off bright and early, with the sun just peekin' over them tall pine trees. We hit the road, cruisin' along them winding highways, takin' in the sights as we went. I tell ya, the scenery up there was somethin' else. Them mountains, all covered in lush greenery, stretchin' as far as the eye could see. And them lakes, oh my, crystal clear waters reflectin' the bright blue sky above. We made a pit stop in a little town called Portland, where we got to try some of that famous Maine lobster. Now, I ain't never tasted anything quite like it. Fresh outta the ocean, melt-in-your-mouth goodness, I tell ya. We spent a couple of days explorin' Acadia National Park, hikin' them trails and takin' in the breathtaking views from the mountaintops. And let me tell ya, that ocean breeze sure did feel mighty fine on our skin. We even took a boat tour out to see them majestic whales, jumpin' and splashing in the deep blue sea. It was a sight to behold, my friend. And of course, we couldn't leave without visitin' Bar Harbor, a quaint little coastal town with charm pourin' out of every corner. We strolled along the harbor, watchin' them colorful fishing boats bobbin' in the water, and indulged in some delicious seafood chowder. Maine sure did steal a piece of our hearts, my friend. The memories we made on that trip will stay with us forever.
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, just kinda pain in the butt? No, it is. It is. And I sometimes just kinda miss the eye. I don't know if you know the movie Airplane? Yes. Of course. Where he says, I have a drinking problem. And that he keeps missing his face with the drink. That's me in the contact lens. Surely you must know that I know the movie Airplane. I do. I do know that. Stop calling me Shirley. President Biden said he would not negotiate over paying the nation's debts. But he is meeting today with House Speaker Kevin McCarthy. Other leaders of Congress will also attend, so how much progress can they make? I'm Ian Martinez with Steve Inskeep, and this is Up First from NPR News. Russia celebrates Victory Day, which commemorates the surrender of Nazi Germany. Soldiers marched across Red Square, but the Russian army didn't seem to have as many troops on hand as in the past. So what does this ritual say about the war Russia is fighting right now?"
以上