Skip to content

Commit 8b55a84

Browse files
committed
Enhance Realtime Out-of-Band Transcription notebook by adding detailed trade-offs for the Realtime model, including cost profiles and implementation complexity. Update the image for better visual representation and add unique IDs for markdown and code cells to improve organization.
1 parent 47bd2ad commit 8b55a84

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

examples/Realtime_out_of_band_transcription.ipynb

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,14 @@
5858
"- **Greater Steerability**: The Realtime model is more steerable, can better follow custom instructions for higher transcription quality, and is not limited by a 1024-token input maximum.\n",
5959
"- **Session Context Awareness**: The model has access to the full session context, so, for example, if you mention your name multiple times, it will transcribe it correctly.\n",
6060
"\n",
61+
"In terms of **trade-offs**:\n",
62+
"- Different cost profile: the realtime model for transcription will take audio in and do text out $32.00\t$0.40. It is also important to note that the whole SESSION CONTEXT is passed in every transcription context, however it will be cached and be priced at $0.40 for cached tokens. The output text tokens is priced at $16.00/M Tokens.\n",
63+
"- gpt-4o-transcription is \t$2.50 text in\t$10.00 text out and $6.00 audio input all per 1M tokens.\n",
64+
"- Other tradfe-offs would be slightly mroe complex to imp-lement compared to simply using the built in transcription option with realtime api with a transcription model.\n",
65+
"\n",
6166
"> Note: Ouf-of-band responses using the realtime model can be used for other use cases beyond user turn transcription. Examples include generating structured summaries, triggering background actions, or performing validation tasks without affecting the main conversation.\n",
6267
"\n",
63-
"<img src=\"../images/oob_transcription_2.png\" alt=\"drawing\" width=\"2000\"/>\n"
68+
"<img src=\"../images/oob_transcription.png\" alt=\"drawing\" width=\"2000\"/>\n"
6469
]
6570
},
6671
{
@@ -106,6 +111,7 @@
106111
},
107112
{
108113
"cell_type": "markdown",
114+
"id": "d7d60089",
109115
"metadata": {},
110116
"source": [
111117
"## 3. Prompts\n",
@@ -119,6 +125,7 @@
119125
{
120126
"cell_type": "code",
121127
"execution_count": null,
128+
"id": "ac3afaab",
122129
"metadata": {},
123130
"outputs": [],
124131
"source": [
@@ -164,6 +171,7 @@
164171
},
165172
{
166173
"cell_type": "markdown",
174+
"id": "4ddbd683",
167175
"metadata": {},
168176
"source": [
169177
"## 4. Core configuration\n",
@@ -178,6 +186,7 @@
178186
{
179187
"cell_type": "code",
180188
"execution_count": 4,
189+
"id": "4b952a29",
181190
"metadata": {},
182191
"outputs": [
183192
{
@@ -252,6 +261,7 @@
252261
},
253262
{
254263
"cell_type": "markdown",
264+
"id": "a905ec16",
255265
"metadata": {},
256266
"source": [
257267
"## 5. Building the Realtime session & the out‑of‑band request\n",
@@ -274,6 +284,7 @@
274284
{
275285
"cell_type": "code",
276286
"execution_count": 6,
287+
"id": "4baf1870",
277288
"metadata": {},
278289
"outputs": [],
279290
"source": [
@@ -353,6 +364,7 @@
353364
},
354365
{
355366
"cell_type": "markdown",
367+
"id": "9afe7911",
356368
"metadata": {},
357369
"source": [
358370
"## 6. Audio streaming: mic → Realtime → speakers\n",
@@ -368,6 +380,7 @@
368380
{
369381
"cell_type": "code",
370382
"execution_count": 7,
383+
"id": "11218bbb",
371384
"metadata": {},
372385
"outputs": [],
373386
"source": [
@@ -463,6 +476,7 @@
463476
},
464477
{
465478
"cell_type": "markdown",
479+
"id": "d02cc1bd",
466480
"metadata": {},
467481
"source": [
468482
"## 7. Extracting and comparing transcripts\n",
@@ -485,6 +499,7 @@
485499
{
486500
"cell_type": "code",
487501
"execution_count": 8,
502+
"id": "cb6acbf0",
488503
"metadata": {},
489504
"outputs": [],
490505
"source": [
@@ -511,6 +526,7 @@
511526
},
512527
{
513528
"cell_type": "markdown",
529+
"id": "6025bbf6",
514530
"metadata": {},
515531
"source": [
516532
"## 8. Listening for Realtime events\n",
@@ -526,6 +542,7 @@
526542
{
527543
"cell_type": "code",
528544
"execution_count": 9,
545+
"id": "d099babd",
529546
"metadata": {},
530547
"outputs": [],
531548
"source": [
@@ -692,6 +709,7 @@
692709
},
693710
{
694711
"cell_type": "markdown",
712+
"id": "10c69ded",
695713
"metadata": {},
696714
"source": [
697715
"## 9. Run Script\n",
@@ -728,6 +746,7 @@
728746
{
729747
"cell_type": "code",
730748
"execution_count": null,
749+
"id": "35c4d7b5",
731750
"metadata": {},
732751
"outputs": [],
733752
"source": [

0 commit comments

Comments
 (0)