|
38 | 38 | }, |
39 | 39 | { |
40 | 40 | "cell_type": "markdown", |
| 41 | + "id": "1c0f46ad", |
41 | 42 | "metadata": {}, |
42 | 43 | "source": [ |
43 | | - "## 1. Why use out-of-band transcription?\n", |
| 44 | + "# 1. Why use out-of-band transcription?\n", |
44 | 45 | "\n", |
45 | 46 | "The Realtime API offers built-in user input transcription, but this relies on a **separate ASR model** (e.g., gpt-4o-transcribe). Using different models for transcription and response generation can lead to discrepancies. For example:\n", |
46 | 47 | "\n", |
|
100 | 101 | }, |
101 | 102 | { |
102 | 103 | "cell_type": "markdown", |
| 104 | + "id": "63ccae3d", |
103 | 105 | "metadata": {}, |
104 | 106 | "source": [ |
105 | | - "## 2. Requirements & Setup\n", |
| 107 | + "# 2. Requirements & Setup\n", |
106 | 108 | "\n", |
107 | 109 | "Ensure your environment meets these requirements:\n", |
108 | 110 | "\n", |
|
144 | 146 | "id": "d7d60089", |
145 | 147 | "metadata": {}, |
146 | 148 | "source": [ |
147 | | - "## 3. Prompts\n", |
| 149 | + "# 3. Prompts\n", |
148 | 150 | "\n", |
149 | 151 | "We use **two distinct prompts**:\n", |
150 | 152 | "\n", |
|
201 | 203 | "id": "4ddbd683", |
202 | 204 | "metadata": {}, |
203 | 205 | "source": [ |
204 | | - "## 4. Core configuration\n", |
| 206 | + "# 4. Core configuration\n", |
205 | 207 | "\n", |
206 | 208 | "We define:\n", |
207 | 209 | "\n", |
|
291 | 293 | "id": "a905ec16", |
292 | 294 | "metadata": {}, |
293 | 295 | "source": [ |
294 | | - "## 5. Building the Realtime session & the out‑of‑band request\n", |
| 296 | + "# 5. Building the Realtime session & the out‑of‑band request\n", |
295 | 297 | "\n", |
296 | 298 | "The Realtime session (`session.update`) configures:\n", |
297 | 299 | "\n", |
|
394 | 396 | "id": "9afe7911", |
395 | 397 | "metadata": {}, |
396 | 398 | "source": [ |
397 | | - "## 6. Audio streaming: mic → Realtime → speakers\n", |
| 399 | + "# 6. Audio streaming: mic → Realtime → speakers\n", |
398 | 400 | "\n", |
399 | 401 | "We now define:\n", |
400 | 402 | "\n", |
|
506 | 508 | "id": "d02cc1bd", |
507 | 509 | "metadata": {}, |
508 | 510 | "source": [ |
509 | | - "## 7. Extracting and comparing transcripts\n", |
| 511 | + "# 7. Extracting and comparing transcripts\n", |
510 | 512 | "\n", |
511 | 513 | "The function below enables us to generate **two transcripts** for each user turn:\n", |
512 | 514 | "\n", |
|
556 | 558 | "id": "6025bbf6", |
557 | 559 | "metadata": {}, |
558 | 560 | "source": [ |
559 | | - "## 8. Listening for Realtime events\n", |
| 561 | + "# 8. Listening for Realtime events\n", |
560 | 562 | "\n", |
561 | 563 | "`listen_for_events` drives the session:\n", |
562 | 564 | "\n", |
|
739 | 741 | "id": "10c69ded", |
740 | 742 | "metadata": {}, |
741 | 743 | "source": [ |
742 | | - "## 9. Run Script\n", |
| 744 | + "# 9. Run Script\n", |
743 | 745 | "\n", |
744 | 746 | "In this step, we run the the code which will allow us to view the realtime model transcription vs transcription model transcriptions. The code does the following:\n", |
745 | 747 | "\n", |
|
0 commit comments