diff --git a/examples/APIAgent-Py/APIAgent.ipynb b/examples/APIAgent-Py/APIAgent.ipynb
index adffce2..d31a0a8 100644
--- a/examples/APIAgent-Py/APIAgent.ipynb
+++ b/examples/APIAgent-Py/APIAgent.ipynb
@@ -826,7 +826,7 @@
    "source": [
     "Awesome! The logs now have a `no_hallucination` score which we can use to filter down hallucinations.\n",
     "\n",
-    "![Hallucination logs](./assets/logs-with-score.gif)\n"
+    "![Hallucination logs](./assets/logs-with-score.mp4)\n"
    ]
   },
   {
@@ -839,7 +839,7 @@
     "non-hallucinations are correct, but in a real-world scenario, you could [collect user feedback](https://www.braintrust.dev/docs/guides/logging#user-feedback)\n",
     "and treat positively rated feedback as ground truth.\n",
     "\n",
-    "![Dataset setup](./assets/dataset-setup.gif)\n",
+    "![Dataset setup](./assets/dataset-setup.mp4)\n",
     "\n",
     "## Running evals\n",
     "\n",
@@ -1020,7 +1020,7 @@
     "\n",
     "To understand why, we can filter down to this regression, and take a look at a side-by-side diff.\n",
     "\n",
-    "![Regression diff](./assets/regression-diff.gif)\n",
+    "![Regression diff](./assets/regression-diff.mp4)\n",
     "\n",
     "Does it matter whether or not the model generates these fields? That's a good question and something you can work on as a next step.\n",
     "Maybe you should tweak how Factuality works, or change the prompt to always return a consistent set of fields.\n",
diff --git a/examples/APIAgent-Py/assets/dataset-setup.gif b/examples/APIAgent-Py/assets/dataset-setup.gif
deleted file mode 100644
index 144e53b..0000000
Binary files a/examples/APIAgent-Py/assets/dataset-setup.gif and /dev/null differ
diff --git a/examples/APIAgent-Py/assets/dataset-setup.mp4 b/examples/APIAgent-Py/assets/dataset-setup.mp4
new file mode 100644
index 0000000..3cc5be3
Binary files /dev/null and b/examples/APIAgent-Py/assets/dataset-setup.mp4 differ
diff --git a/examples/APIAgent-Py/assets/logs-with-score.gif b/examples/APIAgent-Py/assets/logs-with-score.gif
deleted file mode 100644
index db34bc8..0000000
Binary files a/examples/APIAgent-Py/assets/logs-with-score.gif and /dev/null differ
diff --git a/examples/APIAgent-Py/assets/logs-with-score.mp4 b/examples/APIAgent-Py/assets/logs-with-score.mp4
new file mode 100644
index 0000000..c88cf40
Binary files /dev/null and b/examples/APIAgent-Py/assets/logs-with-score.mp4 differ
diff --git a/examples/APIAgent-Py/assets/regression-diff.gif b/examples/APIAgent-Py/assets/regression-diff.gif
deleted file mode 100644
index e3676c2..0000000
Binary files a/examples/APIAgent-Py/assets/regression-diff.gif and /dev/null differ
diff --git a/examples/APIAgent-Py/assets/regression-diff.mp4 b/examples/APIAgent-Py/assets/regression-diff.mp4
new file mode 100644
index 0000000..39ad362
Binary files /dev/null and b/examples/APIAgent-Py/assets/regression-diff.mp4 differ
diff --git a/examples/ClassifyingNewsArticles/ClassifyingNewsArticles.ipynb b/examples/ClassifyingNewsArticles/ClassifyingNewsArticles.ipynb
index 6685083..800171c 100644
--- a/examples/ClassifyingNewsArticles/ClassifyingNewsArticles.ipynb
+++ b/examples/ClassifyingNewsArticles/ClassifyingNewsArticles.ipynb
@@ -423,7 +423,7 @@
     "- You should see the eval scores increase and you can see which test cases improved.\n",
     "- You can also filter the test cases by improvements to know exactly why the scores changed.\n",
     "\n",
-    "![Compare](assets/inspect.gif)\n",
+    "![Compare](assets/inspect.mp4)\n",
     "\n"
    ]
   },
diff --git a/examples/ClassifyingNewsArticles/assets/inspect.gif b/examples/ClassifyingNewsArticles/assets/inspect.gif
deleted file mode 100644
index 87ab876..0000000
Binary files a/examples/ClassifyingNewsArticles/assets/inspect.gif and /dev/null differ
diff --git a/examples/ClassifyingNewsArticles/assets/inspect.mp4 b/examples/ClassifyingNewsArticles/assets/inspect.mp4
new file mode 100644
index 0000000..811d8fd
Binary files /dev/null and b/examples/ClassifyingNewsArticles/assets/inspect.mp4 differ
diff --git a/examples/Github-Issues/Github-Issues.ipynb b/examples/Github-Issues/Github-Issues.ipynb
index 7de01da..c5a9471 100644
--- a/examples/Github-Issues/Github-Issues.ipynb
+++ b/examples/Github-Issues/Github-Issues.ipynb
@@ -482,7 +482,7 @@
     "\n",
     "Happy evaluating!\n",
     "\n",
-    "![improvements](./assets/improvements.gif)\n"
+    "![improvements](./assets/improvements.mp4)\n"
    ]
   }
  ],
diff --git a/examples/Github-Issues/assets/improvements.gif b/examples/Github-Issues/assets/improvements.gif
deleted file mode 100644
index 1a86c58..0000000
Binary files a/examples/Github-Issues/assets/improvements.gif and /dev/null differ
diff --git a/examples/Github-Issues/assets/improvements.mp4 b/examples/Github-Issues/assets/improvements.mp4
new file mode 100644
index 0000000..4e90f58
Binary files /dev/null and b/examples/Github-Issues/assets/improvements.mp4 differ
diff --git a/examples/LLaMa-3_1-Tools/LLaMa-3_1-Tools.ipynb b/examples/LLaMa-3_1-Tools/LLaMa-3_1-Tools.ipynb
index c034dae..f34369b 100644
--- a/examples/LLaMa-3_1-Tools/LLaMa-3_1-Tools.ipynb
+++ b/examples/LLaMa-3_1-Tools/LLaMa-3_1-Tools.ipynb
@@ -756,7 +756,7 @@
     "\n",
     "Although it's a fraction of the cost, it's both slower (likely due to rate limits) and worse performing than GPT-4o. 12 of the 60 cases failed to parse. Let's take a look at one of those in depth.\n",
     "\n",
-    "![parsing-failure](./assets/parsing-failure.gif)\n",
+    "![parsing-failure](./assets/parsing-failure.mp4)\n",
     "\n",
     "That definitely looks like an invalid tool call. Maybe we can experiment with tweaking the prompt to get better results.\n",
     "\n",
diff --git a/examples/LLaMa-3_1-Tools/assets/parsing-failure.gif b/examples/LLaMa-3_1-Tools/assets/parsing-failure.gif
deleted file mode 100644
index a9148b0..0000000
Binary files a/examples/LLaMa-3_1-Tools/assets/parsing-failure.gif and /dev/null differ
diff --git a/examples/LLaMa-3_1-Tools/assets/parsing-failure.mp4 b/examples/LLaMa-3_1-Tools/assets/parsing-failure.mp4
new file mode 100644
index 0000000..f22236e
Binary files /dev/null and b/examples/LLaMa-3_1-Tools/assets/parsing-failure.mp4 differ
diff --git a/examples/OTEL-logging/assets/add-post-filter.gif b/examples/OTEL-logging/assets/add-post-filter.gif
deleted file mode 100644
index 8dfbcbf..0000000
Binary files a/examples/OTEL-logging/assets/add-post-filter.gif and /dev/null differ
diff --git a/examples/OTEL-logging/assets/add-post-filter.mp4 b/examples/OTEL-logging/assets/add-post-filter.mp4
new file mode 100644
index 0000000..bbdbf0a
Binary files /dev/null and b/examples/OTEL-logging/assets/add-post-filter.mp4 differ
diff --git a/examples/OTEL-logging/assets/otel-demo.gif b/examples/OTEL-logging/assets/otel-demo.gif
deleted file mode 100644
index 17f8395..0000000
Binary files a/examples/OTEL-logging/assets/otel-demo.gif and /dev/null differ
diff --git a/examples/OTEL-logging/assets/otel-demo.mp4 b/examples/OTEL-logging/assets/otel-demo.mp4
new file mode 100644
index 0000000..7bcb2f4
Binary files /dev/null and b/examples/OTEL-logging/assets/otel-demo.mp4 differ
diff --git a/examples/OTEL-logging/assets/spans.gif b/examples/OTEL-logging/assets/spans.gif
deleted file mode 100644
index 27afc0c..0000000
Binary files a/examples/OTEL-logging/assets/spans.gif and /dev/null differ
diff --git a/examples/OTEL-logging/assets/spans.mp4 b/examples/OTEL-logging/assets/spans.mp4
new file mode 100644
index 0000000..1fd1e53
Binary files /dev/null and b/examples/OTEL-logging/assets/spans.mp4 differ
diff --git a/examples/OTEL-logging/otel-logging.mdx b/examples/OTEL-logging/otel-logging.mdx
index 98ceaed..7865d20 100644
--- a/examples/OTEL-logging/otel-logging.mdx
+++ b/examples/OTEL-logging/otel-logging.mdx
@@ -141,7 +141,7 @@ Run `npm install` to install the required dependencies, then `npm run dev` to la
 
 Open your Braintrust project to the **Logs** page, and select **What orders have shipped?** in your applications. You should be able to watch the logs filter in as your application makes HTTP requests and LLM calls.
 
-![LLM calls and logs side by side](assets/otel-demo.gif)
+![LLM calls and logs side by side](assets/otel-demo.mp4)
 
 Because this application is using multi-step streaming and tool calls, the logs are especially interesting. In Braintrust, logs consist of [traces](/docs/guides/traces), which roughly correspond to a single request or interaction in your application. Traces consist of one or more spans, each of which corresponds to a unit of work in your application. In this example, each step and tool call is logged inside of its own span. This level of granularity makes it easier to debug issues, track user behavior, and collect data into datasets.
 
@@ -149,11 +149,11 @@ Because this application is using multi-step streaming and tool calls, the logs
 
 Run a couple more queries in the app and notice the logs that are generated. Our app is logging both `GET` and `POST` requests, but we’re most interested in the `POST` requests since they contain our LLM calls. We can apply a filter using the [BTQL](/docs/reference/btql) query `Name LIKE 'POST%'` so that we only see the traces we care about:
 
-![Filter using BTQL](assets/add-post-filter.gif)
+![Filter using BTQL](assets/add-post-filter.mp4)
 
 You should now have a list of traces for all the `POST` requests your app has made. Each contains the inputs and outputs of each LLM call in a span called `ai.streamText`. If you go further into the trace, you’ll also notice a span for each tool call.
 
-![Expanding tool call and stream spans](assets/spans.gif)
+![Expanding tool call and stream spans](assets/spans.mp4)
 
 This is valuable data that can be used to evaluate the quality of accuracy of your application in Braintrust.
 
diff --git a/examples/PDFPlayground/PDFPlayground.mdx b/examples/PDFPlayground/PDFPlayground.mdx
index 9eea335..43e3cb9 100644
--- a/examples/PDFPlayground/PDFPlayground.mdx
+++ b/examples/PDFPlayground/PDFPlayground.mdx
@@ -348,7 +348,7 @@ Once your traces have been logged, you can use the Braintrust UI to manage your
 
 You can store the user spans from your PDF traces into a dataset. Select the span, and then select **Add span to dataset**, or use the hotkey `D` to speed this up.
 
-![add span to dataset](./assets/add-span-to-dataset.gif)
+![add span to dataset](./assets/add-span-to-dataset.mp4)
 
 ### Trying system prompts in a playground
 
@@ -357,7 +357,7 @@ Select a system prompt span, and then select **Try prompt** to:
 1. Save the prompt (for example, "system1") to your library by selecting **Save as custom prompt**
 2. Launch a playground using the saved prompt by selecting **Create playground with prompt**
 
-![try prompt from span](./assets/try-prompt.gif)
+![try prompt from span](./assets/try-prompt.mp4)
 
 ### File attachment methods
 
@@ -365,13 +365,13 @@ There are two ways to attach PDF files in playgrounds: using the paperclip butto
 
 - To upload files directly from your local machine, start by selecting **+ Message** to add a user prompt. Then, select **+ Message Part** > **File**. This will display a paperclip icon on the right side. Select it to upload a file from your local machine.
 
-![paperclip UI method](./assets/paperclip.gif)
+![paperclip UI method](./assets/paperclip.mp4)
 
 This method is particularly useful when you're working with local files that aren't accessible via public URL.
 
 - To use the public URL method, paste the URL directly into the file message input field. You can also use mustache syntax to extract the URL from metadata.
 
-![public url method](./assets/url.gif)
+![public url method](./assets/url.mp4)
 
 This method streamlines the process when you're working with publicly available PDFs, like the earnings call transcripts we're using in this cookbook.
 
diff --git a/examples/PDFPlayground/assets/add-span-to-dataset.gif b/examples/PDFPlayground/assets/add-span-to-dataset.gif
deleted file mode 100644
index a1ee2a4..0000000
Binary files a/examples/PDFPlayground/assets/add-span-to-dataset.gif and /dev/null differ
diff --git a/examples/PDFPlayground/assets/add-span-to-dataset.mp4 b/examples/PDFPlayground/assets/add-span-to-dataset.mp4
new file mode 100644
index 0000000..99afd87
Binary files /dev/null and b/examples/PDFPlayground/assets/add-span-to-dataset.mp4 differ
diff --git a/examples/PDFPlayground/assets/paperclip.gif b/examples/PDFPlayground/assets/paperclip.gif
deleted file mode 100644
index 8eb8f9b..0000000
Binary files a/examples/PDFPlayground/assets/paperclip.gif and /dev/null differ
diff --git a/examples/PDFPlayground/assets/paperclip.mp4 b/examples/PDFPlayground/assets/paperclip.mp4
new file mode 100644
index 0000000..8cc02ba
Binary files /dev/null and b/examples/PDFPlayground/assets/paperclip.mp4 differ
diff --git a/examples/PDFPlayground/assets/try-prompt.gif b/examples/PDFPlayground/assets/try-prompt.gif
deleted file mode 100644
index 32366b6..0000000
Binary files a/examples/PDFPlayground/assets/try-prompt.gif and /dev/null differ
diff --git a/examples/PDFPlayground/assets/try-prompt.mp4 b/examples/PDFPlayground/assets/try-prompt.mp4
new file mode 100644
index 0000000..266b869
Binary files /dev/null and b/examples/PDFPlayground/assets/try-prompt.mp4 differ
diff --git a/examples/PDFPlayground/assets/url.gif b/examples/PDFPlayground/assets/url.gif
deleted file mode 100644
index af3b4af..0000000
Binary files a/examples/PDFPlayground/assets/url.gif and /dev/null differ
diff --git a/examples/PDFPlayground/assets/url.mp4 b/examples/PDFPlayground/assets/url.mp4
new file mode 100644
index 0000000..37c231b
Binary files /dev/null and b/examples/PDFPlayground/assets/url.mp4 differ
diff --git a/examples/ProviderBenchmark/ProviderBenchmark.ipynb b/examples/ProviderBenchmark/ProviderBenchmark.ipynb
index 2a90e01..2fe278f 100644
--- a/examples/ProviderBenchmark/ProviderBenchmark.ipynb
+++ b/examples/ProviderBenchmark/ProviderBenchmark.ipynb
@@ -433,7 +433,7 @@
     "\n",
     "Let's start by looking at the project view. Braintrust makes it easy to morph this into a multi-level grouped analysis where we can see the score vs. duration in a scatter plot, and how each provider stacks up in the table.\n",
     "\n",
-    "![Setting up the table](./assets/configuring-graph.gif)\n",
+    "![Setting up the table](./assets/configuring-graph.mp4)\n",
     "\n",
     "### Insights\n",
     "\n",
diff --git a/examples/ProviderBenchmark/assets/configuring-graph.gif b/examples/ProviderBenchmark/assets/configuring-graph.gif
deleted file mode 100644
index 0088f61..0000000
Binary files a/examples/ProviderBenchmark/assets/configuring-graph.gif and /dev/null differ
diff --git a/examples/ProviderBenchmark/assets/configuring-graph.mp4 b/examples/ProviderBenchmark/assets/configuring-graph.mp4
new file mode 100644
index 0000000..a0290d9
Binary files /dev/null and b/examples/ProviderBenchmark/assets/configuring-graph.mp4 differ
diff --git a/examples/Realtime/realtime-rag/utils/docs-sample/changelog.mdx b/examples/Realtime/realtime-rag/utils/docs-sample/changelog.mdx
index f296a3b..dcb95cb 100644
--- a/examples/Realtime/realtime-rag/utils/docs-sample/changelog.mdx
+++ b/examples/Realtime/realtime-rag/utils/docs-sample/changelog.mdx
@@ -7,7 +7,7 @@ import { LoomVideo } from "#/ui/docs/loom";
 import Link from "fumadocs-core/link";
 import { Callout } from "fumadocs-ui/components/callout";
 import { Step, Steps } from "fumadocs-ui/components/steps";
-import Image from 'next/image';
+import Image from "next/image";
 
 # Changelog
 
@@ -52,18 +52,18 @@ import Image from 'next/image';
 - The Traceloop OTEL integration now uses the input and output attributes to populate the corresponding fields in Braintrust.
 - The monitor page now supports querying experiment metrics.
 - Removed the `filters` param from the REST API fetch endpoint. For complex
-queries, we recommend using the `/btql` endpoint ([docs](/docs/reference/btql)).
+  queries, we recommend using the `/btql` endpoint ([docs](/docs/reference/btql)).
 - New experiment summary layout option, a url-friendly view for experiment summaries that respects all filters.
 - Add a default limit of 10 to all fetch and `/btql` requests for project_logs.
 - You can now export your prompts from the playground as code snippets and run them through the [AI proxy](/docs/guides/proxy).
 - Add a fallback for the "add prompt" dropdown button in the playground, which
-will search for prompts within the current project if the cross-org prompts
-query fails.
+  will search for prompts within the current project if the cross-org prompts
+  query fails.
 
 ### SDK (version 0.0.171)
 
 - Add a `.data` method to the `Attachment` class, which lets you inspect the
-loaded attachment data.
+  loaded attachment data.
 
 ## Week of 2024-11-12
 
@@ -99,6 +99,7 @@ loaded attachment data.
 - Create custom columns on dataset, experiment and logs tables from `JSON` values in `input`, `output`, `expected`, or `metadata` fields.
 
 ### API (version 0.0.59)
+
 - Fix permissions bug with updating org-scoped env vars
 
 ## Week of 2024-10-28
@@ -151,7 +152,7 @@ loaded attachment data.
 ### SDK (version 0.0.164)
 
 - Add `braintrust.permalink` function to create deep links pointing to
-particular spans in the Braintrust UI.
+  particular spans in the Braintrust UI.
 
 ## Week of 2024-10-07
 
@@ -170,7 +171,7 @@ particular spans in the Braintrust UI.
 ### SDK (version 0.0.161)
 
 - Add utility function `spanComponentsToObjectId` for resolving the object ID
-from an exported span slug.
+  from an exported span slug.
 
 ## Week of 2024-09-30
 
@@ -178,19 +179,21 @@ from an exported span slug.
 - Add support for [Cerebras](https://cerebras.ai/) models in the proxy, playground, and saved prompts.
 - You can now create [span iframe viewers](/docs/guides/tracing#custom-span-iframes) to visualize span data in a custom iframe.
   In this example, the "Table" section is a custom span iframe.
-![Span iframe](./guides/traces/span-iframe.png)
+  ![Span iframe](./guides/traces/span-iframe.png)
 - `NOT LIKE`, `NOT ILIKE`, `NOT INCLUDES`, and `NOT CONTAINS` supported in BTQL.
 - Add "Upload Rows" button to insert rows into an existing dataset from CSV or JSON.
 - Add "Maximum" aggregate score type.
 - The experiment table now supports grouping by input (for trials) or by a metadata field.
-    - The Name and Input columns are now pinned
+  - The Name and Input columns are now pinned
 - Gemini models now support multimodal inputs.
 
 ## Week of 2024-09-23
 
 - Basic monitor page that shows aggregate values for latency, token count, time to first token, and cost for logs.
 - Create custom tools to use in your prompts and in the playground. See the [docs](/docs/guides/prompts#calling-external-tools) for more details.
-- <Link href="/app/settings?subroute=env-vars" target="_blank">Set org-wide environment variables</Link> to use in these tools
+- <Link href="/app/settings?subroute=env-vars" target="_blank">
+    Set org-wide environment variables
+  </Link> to use in these tools
 - Pull your prompts to your codebase using the `braintrust pull` command.
 - Select and compare multiple experiments in the experiment view using the `compared with` dropdown.
 - The playground now displays aggregate scores (avg/max/min) for each prompt and supports sorting rows by a score.
@@ -220,7 +223,6 @@ from an exported span slug.
 - The tag picker now includes tags that were added dynamically via API, in addition to the tags configured for your project.
 - Added a REST API for managing AI secrets. See [docs](/docs/reference/api/AiSecrets).
 
-
 ### SDK (version 0.0.158)
 
 - A dedicated `update` method is now available for datasets.
@@ -233,11 +235,11 @@ from an exported span slug.
 - You can now create server-side online evaluations for your logs. Online evals support both [autoevals](/docs/reference/autoevals) and
   [custom scorers](/docs/guides/playground) you define as LLM-as-a-judge, TypeScript, or Python functions. See
   [docs](/docs/guides/evals/write#online-evaluation) for more details.
-<LoomVideo id="13e916c6095c4a98bc5682bed038c7ea" />
+  <LoomVideo id="13e916c6095c4a98bc5682bed038c7ea" />
 - New member invitations now support being added to multiple permission groups.
 - Move datasets and prompts to a new Library navigation tab, and include a list of custom scorers.
 - Clean up tree view by truncating the root preview and showing a preview of a node only if collapsed.
-![Truncated tree view](./reference/release-notes/truncated-tree-view.png)
+  ![Truncated tree view](./reference/release-notes/truncated-tree-view.png)
 - Automatically save changes to table views.
 
 ## Week of 2024-09-02
@@ -294,12 +296,13 @@ npx braintrust eval --bundle
 ## Week of 2024-08-12
 
 - You can now create custom LLM and code (TypeScript and Python) evaluators in the playground.
-<LoomVideo id="407591dee805422588ee83a8bcb44100" />
+
+  <LoomVideo id="407591dee805422588ee83a8bcb44100" />
 
 - Fullscreen trace toggle
 - Datasets now accept JSON file uploads
 - When uploading a CSV/JSON file to a dataset, columns/fields named `input`, `expected`, and `metadata`
-are now auto-assigned to the corresponding dataset fields
+  are now auto-assigned to the corresponding dataset fields
 - Fix bug in logs/dataset viewer when changing the search params.
 
 ### API (version 0.0.53)
@@ -315,7 +318,7 @@ are now auto-assigned to the corresponding dataset fields
   - These metrics, along with cost, now exclude LLM calls used in autoevals (as of 0.0.85)
 - Switching organizations via the header navigates to the same-named project in the selected organization
 - Added `MarkAsyncWrapper` to the Python SDK to allow explicitly marking
-functions which return awaitable objects as async
+  functions which return awaitable objects as async
 
 ### Autoevals (version 0.0.85)
 
@@ -370,37 +373,37 @@ functions which return awaitable objects as async
 ## Week of 2024-07-22
 
 - Categorical human review scores can now be re-ordered via Drag-n-Drop.
-![Reorder categorical score](./reference/release-notes/category-score-reorder.gif)
+  ![Reorder categorical score](./reference/release-notes/category-score-reorder.mp4)
 - Human review row selection is now a free text field, enabling a quick jump to a specific row.
-![Human review free text](./reference/release-notes/humanreviewfreetext.png)
+  ![Human review free text](./reference/release-notes/humanreviewfreetext.png)
 - Added REST endpoint for managing org membership. See
   [docs](/docs/reference/api/Organizations#modify-organization-membership).
 
 ### API (version 0.0.51)
 
-* The proxy is now a first-class citizen in the API service, which simplifies deployment and sets the groundwork for some
+- The proxy is now a first-class citizen in the API service, which simplifies deployment and sets the groundwork for some
   exciting new features. Here is what you need to know:
-  * The updates are available as of API version 0.0.51.
-  * The proxy is now accessible at `https://api.braintrust.dev/v1/proxy`. You can use this as a base URL in your OpenAI client,
+  - The updates are available as of API version 0.0.51.
+  - The proxy is now accessible at `https://api.braintrust.dev/v1/proxy`. You can use this as a base URL in your OpenAI client,
     instead of `https://braintrustproxy.com/v1`. [NOTE: The latter is still supported, but will be deprecated in the future.]
-  * If you are self-hosting, the proxy is now bundled into the API service. That means you no longer need to deploy the proxy as
+  - If you are self-hosting, the proxy is now bundled into the API service. That means you no longer need to deploy the proxy as
     a separate service.
-  * If you have deployed through AWS, after updating the Cloudformation, you'll need to grab the "Universal API URL" from the
+  - If you have deployed through AWS, after updating the Cloudformation, you'll need to grab the "Universal API URL" from the
     "Outputs" tab.
 
 ![Universal URL Cloudformation](./reference/release-notes/universal-url-cloudformation.png)
 
-  * Then, replace that in your settings page <Link href="/app/settings?subroute=api-url" target="_blank">settings page</Link>
+- Then, replace that in your settings page <Link href="/app/settings?subroute=api-url" target="_blank">settings page</Link>
 
 ![Universal API](./reference/release-notes/universal-api.png)
 
-  * If you have a Docker-based deployment, you can just update your containers.
-  * Once you see the "Universal API" indicator, you can remove the proxy URL from your settings page, if you have it set.
+- If you have a Docker-based deployment, you can just update your containers.
+- Once you see the "Universal API" indicator, you can remove the proxy URL from your settings page, if you have it set.
 
 ### SDK (version 0.0.146)
 
-* Add support for `max_concurrency` in the Python SDK
-* Hill climbing evals that use a `BaseExperiment` as data will use that as the default base experiment.
+- Add support for `max_concurrency` in the Python SDK
+- Hill climbing evals that use a `BaseExperiment` as data will use that as the default base experiment.
 
 ## Week of 2024-07-15
 
@@ -420,14 +423,14 @@ functions which return awaitable objects as async
 
 ### Autoevals (version 0.0.77)
 
-* Officially switch the default model to be `gpt-4o`. Our testing showed that it performed on average 10% more accurately than `gpt-3.5-turbo`!
-* Support claude models (e.g. claude-3-5-sonnet-20240620). You can use them by simply specifying the `model` param in any LLM based evaluator.
-  * Under the hood, this will use the proxy, so make sure to configure your Anthropic API keys in your settings.
+- Officially switch the default model to be `gpt-4o`. Our testing showed that it performed on average 10% more accurately than `gpt-3.5-turbo`!
+- Support claude models (e.g. claude-3-5-sonnet-20240620). You can use them by simply specifying the `model` param in any LLM based evaluator.
+  - Under the hood, this will use the proxy, so make sure to configure your Anthropic API keys in your settings.
 
 ## Week of 2024-07-08
 
 - Human review scores are now sortable from the project configuration page.
-![Reorder scores](./reference/release-notes/reorder-human-review-scores.gif)
+  ![Reorder scores](./reference/release-notes/reorder-human-review-scores.mp4)
 - Streaming support for tool calls in Anthropic models through the proxy and playground.
 - The playground now supports different "parsing" modes:
   - `auto`: (same as before) the completion text and the first tool call arguments, if any
@@ -437,7 +440,6 @@ functions which return awaitable objects as async
 - Cleaned up environment variables in the public [docker
   deployment](https://github.com/braintrustdata/braintrust-deployment/tree/main/docker). Functionally, nothing has changed.
 
-
 ### Autoevals (version 0.0.76)
 
 - New `.partial(...)` syntax to initialize a scorer with partial arguments like `criteria` in `ClosedQA`.
@@ -447,7 +449,7 @@ functions which return awaitable objects as async
 
 - Table views [can now be saved](/docs/reference/views), persisting the BTQL filters, sorts, and column state.
 - Add support for the new `window.ai` model into the playground.
-![window.ai](./reference/release-notes/window-ai.gif)
+  ![window.ai](./reference/release-notes/window-ai.mp4)
 - Use push history when navigating table rows to allow for back button navigation.
 - In the experiments list, grouping by a metadata field will group rows in the table as well.
 - Allow the trace tree panel to be resized.
@@ -471,8 +473,8 @@ const foo = wrapTraced(async function foo(input) {
 });
 ```
 
-
 ### SDK (version 0.0.138)
+
 - The TypeScript SDK's `Eval()` function now takes a `maxConcurrency` parameter, which bounds the
   number of concurrent tasks that run.
 - `braintrust install api` now sets up your API and Proxy URL in your environment.
@@ -512,7 +514,7 @@ const foo = wrapTraced(async function foo(input) {
 ## Week of 2024-06-03
 
 - You can now collapse the trace tree. It's auto collapsed if you have a single span.
-![Collapsible trace tree](./reference/release-notes/trace-tree.png)
+  ![Collapsible trace tree](./reference/release-notes/trace-tree.png)
 - Improvements to the experiment chart including greyed out lines for inactive scores and improved legend.
 - Show diffs when you save a new prompt version.
 
@@ -819,7 +821,7 @@ server with `curl [api-url]/version`, where the API URL can be found on the <Lin
 ## Week of 2024-03-25
 
 - Introduce multimodal support for OpenAI and Anthropic models in the prompt playground and proxy. You can now pass image URLs, base64-encoded image strings, or mustache template variables to models that support multimodal inputs.
-  ![Multimodal prompt](./reference/release-notes/multimodal-prompt.gif)
+  ![Multimodal prompt](./reference/release-notes/multimodal-prompt.mp4)
 - The REST API now gzips responses.
 - You can now return dynamic arrays of scores in `Eval()` functions ([docs](/docs/guides/evals#dynamic-scoring)).
 - Launched [Reporters](/docs/guides/evals#custom-reporters), a way to summarize and report eval results in a custom format.
@@ -994,7 +996,7 @@ for record in dataset:
 
 ## Week of 2024-02-05
 
-![Playground](/docs/release-notes/ReleaseNotes-2023-02-05-Playground.gif)
+![Playground](/docs/release-notes/ReleaseNotes-2023-02-05-Playground.mp4)
 
 - Tons of improvements to the prompt playground:
   - A new "compact" view, that shows just one line per row, so you can quickly scan across rows. You can toggle between the two modes.
@@ -1154,7 +1156,7 @@ Eval([eval_name], {
 - Added experiment search on project view to filter by experiment name
   <figure>
     ![Experiment search and filtering on project
-    view](/docs/release-notes/ReleaseNotes11-27-search.gif)
+    view](/docs/release-notes/ReleaseNotes11-27-search.mp4)
   </figure>
 - Upgraded AI Proxy to support [tracking Prometheus metrics](https://github.com/braintrustdata/braintrust-proxy/blob/a31a82e6d46ff442a3c478773e6eec21f3d0ba69/apis/cloudflare/wrangler-template.toml#L19C1-L19C1)
 - Modified Autoevals library to use the [AI proxy](/docs/guides/proxy)
@@ -1176,7 +1178,7 @@ Eval([eval_name], {
 - Fixed our libraries including Autoevals to work with OpenAI’s new libraries
   <figure>
     ![Added OpenAI function calling in the prompt
-    playground](/docs/release-notes/ReleaseNotes-2023-11-functions.gif)
+    playground](/docs/release-notes/ReleaseNotes-2023-11-functions.mp4)
   </figure>
 - Added support for function calling and tools in our prompt playground
 - Added tabs on a project page for datasets, experiments, etc.
@@ -1283,7 +1285,7 @@ Eval(
 - The prompt playground is now live! We're excited to get your feedback as we continue to build
   this feature out. See [the docs](/docs/guides/playground) for more information.
 
-![Sync Playground](/docs/release-notes/ReleaseNotes-2023-08-Playground.gif)
+![Sync Playground](/docs/release-notes/ReleaseNotes-2023-08-Playground.mp4)
 
 ## Week of 2023-08-21
 
@@ -1295,7 +1297,7 @@ Eval(
   changes to your code.
 - You can now edit datasets in the UI.
 
-![Edit Dataset](/docs/release-notes/ReleaseNotes-2023-08-EditDataset.gif)
+![Edit Dataset](/docs/release-notes/ReleaseNotes-2023-08-EditDataset.mp4)
 
 ## Week of 2023-08-14
 
@@ -1399,11 +1401,11 @@ braintrust install api <YOUR_CLOUDFORMAT_STACK_NAME> --update-template
 
 - You can now swap the primary and comparison experiment with a single click.
 
-![Swap experiments](/docs/release-notes/ReleaseNotes-2023-07-Swap.gif)
+![Swap experiments](/docs/release-notes/ReleaseNotes-2023-07-Swap.mp4)
 
 - You can now compare `output` vs. `expected` within an experiment.
 
-![Diff output and expected](/docs/release-notes/ReleaseNotes-2023-07-Diff.gif)
+![Diff output and expected](/docs/release-notes/ReleaseNotes-2023-07-Diff.mp4)
 
 - Version 0.0.19 is out for the SDK. It is an important update that throws an error if your payload is larger than 64KB in size.
 
@@ -1435,7 +1437,7 @@ braintrust install api <YOUR_CLOUDFORMAT_STACK_NAME> --update-template
 
 - New scatter plot and histogram insights to quickly analyze scores and filter down examples.
 
-  ![Scatter Plot](/docs/release-notes/ReleaseNotes-2023-06-Scatter.gif)
+  ![Scatter Plot](/docs/release-notes/ReleaseNotes-2023-06-Scatter.mp4)
 
 - API keys that can be set in the SDK (explicitly or through an environment variable) and do not require user login.
   Visit the <Link href="/app/settings?subroute=api-keys">settings page</Link> to create an API key.
diff --git a/examples/Realtime/realtime-rag/utils/docs-sample/human-review.mdx b/examples/Realtime/realtime-rag/utils/docs-sample/human-review.mdx
index 8059234..1aa25bb 100644
--- a/examples/Realtime/realtime-rag/utils/docs-sample/human-review.mdx
+++ b/examples/Realtime/realtime-rag/utils/docs-sample/human-review.mdx
@@ -13,7 +13,7 @@ feedback from end users, subject matter experts, and product teams in one place.
 use human review to evaluate/compare experiments, assess the efficacy of your automated scoring
 methods, and curate log events to use in your evals.
 
-![Human review label](./human-review/label.gif)
+![Human review label](./human-review/label.mp4)
 
 ## Configuring human review
 
@@ -32,7 +32,6 @@ options and their scores.
 Once you create a score, it will automatically appear in the "Scores" section in each experiment
 and log event throughout the project.
 
-
 ### Writing to expected fields
 
 You may choose to write categorical scores to the `expected` field of a span instead of a score.
@@ -40,9 +39,9 @@ To enable this, simply check the "Write to expected field instead of score" opti
 an option to select multiple values when writing to the expected field.
 
 <Callout type="info">
-  A numeric score will not be assigned to the categorical options when writing to the expected
-  field. If there is an existing object in the expected field, the categorical value will be
-  appended to the object.
+  A numeric score will not be assigned to the categorical options when writing
+  to the expected field. If there is an existing object in the expected field,
+  the categorical value will be appended to the object.
 </Callout>
 
 ![Write to expected](./human-review/write-to-expected.webp)
@@ -54,7 +53,7 @@ In addition to categorical scores, you can always directly edit the structured o
 To manually review results in your logs or an experiment, simply click on a row, and you'll see
 the human review scores you configured in the expanded trace view.
 
-![Set score](./human-review/in-experiment.gif)
+![Set score](./human-review/in-experiment.mp4)
 
 As you set scores, they will be automatically saved and reflected in the summary metrics. The exact same
 mechanism works whether you're reviewing logs or experiments.
@@ -64,7 +63,7 @@ mechanism works whether you're reviewing logs or experiments.
 In addition to setting scores, you can also add comments to spans and update their `expected` values. These updates
 are tracked alongside score updates to form an audit trail of edits to a span.
 
-![Save comment](./human-review/comment.gif)
+![Save comment](./human-review/comment.mp4)
 
 ## Rapid review mode
 
@@ -72,7 +71,7 @@ If you or a subject matter expert is reviewing a large number of logs, you can u
 a UI that's optimized specifically for review. To enter review mode, hit the "r" key or the expand (<Maximize2 className="size-3 inline" />)
 icon next to the "Human review" header.
 
-![Review mode](./human-review/review-mode.gif)
+![Review mode](./human-review/review-mode.mp4)
 
 In review mode, you can set scores, leave comments, and edit expected values. Review mode is optimized for keyboard
 navigation, so you can quickly move between scores and rows with keyboard shortcuts. You can also share a link to the
diff --git a/examples/Realtime/realtime-rag/utils/docs-sample/playground.mdx b/examples/Realtime/realtime-rag/utils/docs-sample/playground.mdx
index 1963380..cd33703 100644
--- a/examples/Realtime/realtime-rag/utils/docs-sample/playground.mdx
+++ b/examples/Realtime/realtime-rag/utils/docs-sample/playground.mdx
@@ -31,7 +31,7 @@ that includes one or more prompts and is linked to a dataset.
 
 Playgrounds are designed for collaboration and automatically synchronize in real-time.
 
-![Sync Playground](/docs/guides/playground/sync-playground.gif)
+![Sync Playground](/docs/guides/playground/sync-playground.mp4)
 
 To share a playground, simply copy the URL and send it to your collaborators. Your collaborators
 must be members of your organization to see the session. You can invite users from the <Link href="/app/settings?subroute=team" target="_blank">settings</Link> page.
diff --git a/examples/ReceiptExtraction/ReceiptExtraction.ipynb b/examples/ReceiptExtraction/ReceiptExtraction.ipynb
index d51589f..278727e 100644
--- a/examples/ReceiptExtraction/ReceiptExtraction.ipynb
+++ b/examples/ReceiptExtraction/ReceiptExtraction.ipynb
@@ -403,7 +403,7 @@
     "\n",
     "If you click into the gpt-4o experiment and compare it to gpt-4o-mini, you can drill down into the individual improvements and regressions.\n",
     "\n",
-    "![Regressions](./assets/GPT-4o-vs-4o-mini.gif)\n",
+    "![Regressions](./assets/GPT-4o-vs-4o-mini.mp4)\n",
     "\n",
     "There are several different types of regressions, one of which appears to be that `gpt-4o` returns information in a different case than `gpt-4o-mini`. That may or\n",
     "may not be important for this use case, but if not, we could adjust our scoring functions to lowercase everything before comparing.\n",
diff --git a/examples/ReceiptExtraction/assets/GPT-4o-vs-4o-mini.gif b/examples/ReceiptExtraction/assets/GPT-4o-vs-4o-mini.gif
deleted file mode 100644
index b31998d..0000000
Binary files a/examples/ReceiptExtraction/assets/GPT-4o-vs-4o-mini.gif and /dev/null differ
diff --git a/examples/ReceiptExtraction/assets/GPT-4o-vs-4o-mini.mp4 b/examples/ReceiptExtraction/assets/GPT-4o-vs-4o-mini.mp4
new file mode 100644
index 0000000..f99a6f0
Binary files /dev/null and b/examples/ReceiptExtraction/assets/GPT-4o-vs-4o-mini.mp4 differ
diff --git a/examples/SimpleRagas/SimpleRagas.ipynb b/examples/SimpleRagas/SimpleRagas.ipynb
index b60c723..8f3f9bb 100644
--- a/examples/SimpleRagas/SimpleRagas.ipynb
+++ b/examples/SimpleRagas/SimpleRagas.ipynb
@@ -449,7 +449,7 @@
     "results, and maybe we should try using `gpt-4` instead. Braintrust lets us test the effect of this quickly, directly in the UI, before we run\n",
     "a full experiment:\n",
     "\n",
-    "![try gpt-4](./assets/try-gpt-4.gif)\n",
+    "![try gpt-4](./assets/try-gpt-4.mp4)\n",
     "\n",
     "Looks better. Let's update our scoring function to use it and re-run the experiment.\n"
    ]
@@ -575,7 +575,7 @@
     "We can drill down on individual examples of each regression type to better understand it. The side-by-side diffs built into Braintrust make\n",
     "it easy to deeply understand every step of the pipeline, for example, which documents were missing, and why.\n",
     "\n",
-    "![missing docs](./assets/missing-docs.gif)\n",
+    "![missing docs](./assets/missing-docs.mp4)\n",
     "\n",
     "And there you have it! Ragas is a powerful technique, that with the right tools and iteration can lead to really high quality RAG applications. Happy evaling!\n"
    ]
diff --git a/examples/SimpleRagas/assets/missing-docs.gif b/examples/SimpleRagas/assets/missing-docs.gif
deleted file mode 100644
index 15e36ef..0000000
Binary files a/examples/SimpleRagas/assets/missing-docs.gif and /dev/null differ
diff --git a/examples/SimpleRagas/assets/missing-docs.mp4 b/examples/SimpleRagas/assets/missing-docs.mp4
new file mode 100644
index 0000000..f5bc15e
Binary files /dev/null and b/examples/SimpleRagas/assets/missing-docs.mp4 differ
diff --git a/examples/SimpleRagas/assets/try-gpt-4.gif b/examples/SimpleRagas/assets/try-gpt-4.gif
deleted file mode 100644
index 78a398a..0000000
Binary files a/examples/SimpleRagas/assets/try-gpt-4.gif and /dev/null differ
diff --git a/examples/SimpleRagas/assets/try-gpt-4.mp4 b/examples/SimpleRagas/assets/try-gpt-4.mp4
new file mode 100644
index 0000000..b4b6ea0
Binary files /dev/null and b/examples/SimpleRagas/assets/try-gpt-4.mp4 differ
diff --git a/examples/Text2SQL-Data/Text2SQL-Data.ipynb b/examples/Text2SQL-Data/Text2SQL-Data.ipynb
index a523128..9b842f6 100644
--- a/examples/Text2SQL-Data/Text2SQL-Data.ipynb
+++ b/examples/Text2SQL-Data/Text2SQL-Data.ipynb
@@ -400,7 +400,7 @@
     "1. Let's capture the good data into a dataset. Since our eval pipeline did the hard work of generating a reference query and results, we can\n",
     "   now save these, and make sure that future changes we make do not _regress_ the results.\n",
     "\n",
-    "![add to dataset](./assets/add-to-dataset.gif)\n",
+    "![add to dataset](./assets/add-to-dataset.mp4)\n",
     "\n",
     "- The incorrect query didn't seem to get the date format correct. That would probably be improved by showing a sample of the data to the model.\n",
     "\n",
@@ -986,7 +986,7 @@
     "\n",
     "Braintrust makes it easy to filter down to the regressions, and view a side-by-side diff:\n",
     "\n",
-    "![diff](./assets/analyze-regressions.gif)\n",
+    "![diff](./assets/analyze-regressions.mp4)\n",
     "\n",
     "## Conclusion\n",
     "\n",
diff --git a/examples/Text2SQL-Data/assets/add-to-dataset.gif b/examples/Text2SQL-Data/assets/add-to-dataset.gif
deleted file mode 100644
index 17812b2..0000000
Binary files a/examples/Text2SQL-Data/assets/add-to-dataset.gif and /dev/null differ
diff --git a/examples/Text2SQL-Data/assets/add-to-dataset.mp4 b/examples/Text2SQL-Data/assets/add-to-dataset.mp4
new file mode 100644
index 0000000..ec985c9
Binary files /dev/null and b/examples/Text2SQL-Data/assets/add-to-dataset.mp4 differ
diff --git a/examples/Text2SQL-Data/assets/analyze-regressions.gif b/examples/Text2SQL-Data/assets/analyze-regressions.gif
deleted file mode 100644
index a0c2b72..0000000
Binary files a/examples/Text2SQL-Data/assets/analyze-regressions.gif and /dev/null differ
diff --git a/examples/Text2SQL-Data/assets/analyze-regressions.mp4 b/examples/Text2SQL-Data/assets/analyze-regressions.mp4
new file mode 100644
index 0000000..77204be
Binary files /dev/null and b/examples/Text2SQL-Data/assets/analyze-regressions.mp4 differ
diff --git a/examples/ToolOCR/ToolOCR.mdx b/examples/ToolOCR/ToolOCR.mdx
index 7ac7132..8970bff 100644
--- a/examples/ToolOCR/ToolOCR.mdx
+++ b/examples/ToolOCR/ToolOCR.mdx
@@ -95,7 +95,7 @@ braintrust push ocr.py --requirements requirements.txt
 
 To try out the tool, visit the **toolOCR** project in Braintrust, and navigate **Tools**. Here, you can test different images and see what kinds of outputs you're getting from the tool.
 
-![Try gif](assets/try-tool.gif)
+![Try gif](assets/try-tool.mp4)
 
 This is helpful information for deciding if you'd like to do any additional post processing to the text output. For example, you may notice that your output contains `/n` to indicate new lines in the parsed text. You could include additional processing in your tool to do this. If you change your code, just run `braintrust push ocr.py --requirements requirements.txt` again to sync the tool with Braintrust.
 
@@ -117,7 +117,7 @@ prompt = project.prompts.create(
 
 Just like the tool, you can run it in the UI and even try it out on some examples:
 
-![Try prompt](assets/try-prompt.gif)
+![Try prompt](assets/try-prompt.mp4)
 
 If you visit the **Logs** tab, you can check out detailed logs for each call:
 
@@ -142,7 +142,7 @@ Then, navigate to **Dataset** in your playground and select the **Recipes** data
 
 Your playground is now set up with a prompt, model choice, dataset, and the tool we created. Hit **Run** to run the prompt and tool on the images in the dataset.
 
-![Run playground](assets/run-playground.gif)
+![Run playground](assets/run-playground.mp4)
 
 ## Iterating on the prompt
 
@@ -150,7 +150,7 @@ Now that we have an interactive environment to test out our prompt and tool call
 
 Hit the copy icon to duplicate your prompt and start tweaking. You can also tweak the original prompt and save your changes there if you'd like. For example, you can try instructing the model to always list the quantity of each ingredient you need to purchase.
 
-![Tweak prompt](assets/tweak-prompt.gif)
+![Tweak prompt](assets/tweak-prompt.mp4)
 
 Once you're satisfied with the prompt, hit **Update** to save the changes. Each time you save the prompt, you
 create a new version. To learn more about how to use a prompt in your code, check out the
diff --git a/examples/ToolOCR/assets/run-playground.gif b/examples/ToolOCR/assets/run-playground.gif
deleted file mode 100644
index aa30536..0000000
Binary files a/examples/ToolOCR/assets/run-playground.gif and /dev/null differ
diff --git a/examples/ToolOCR/assets/run-playground.mp4 b/examples/ToolOCR/assets/run-playground.mp4
new file mode 100644
index 0000000..24d9e0a
Binary files /dev/null and b/examples/ToolOCR/assets/run-playground.mp4 differ
diff --git a/examples/ToolOCR/assets/try-prompt.gif b/examples/ToolOCR/assets/try-prompt.gif
deleted file mode 100644
index 864ac57..0000000
Binary files a/examples/ToolOCR/assets/try-prompt.gif and /dev/null differ
diff --git a/examples/ToolOCR/assets/try-prompt.mp4 b/examples/ToolOCR/assets/try-prompt.mp4
new file mode 100644
index 0000000..9fccb54
Binary files /dev/null and b/examples/ToolOCR/assets/try-prompt.mp4 differ
diff --git a/examples/ToolOCR/assets/try-tool.gif b/examples/ToolOCR/assets/try-tool.gif
deleted file mode 100644
index bef1402..0000000
Binary files a/examples/ToolOCR/assets/try-tool.gif and /dev/null differ
diff --git a/examples/ToolOCR/assets/try-tool.mp4 b/examples/ToolOCR/assets/try-tool.mp4
new file mode 100644
index 0000000..03dd298
Binary files /dev/null and b/examples/ToolOCR/assets/try-tool.mp4 differ
diff --git a/examples/ToolOCR/assets/tweak-prompt.gif b/examples/ToolOCR/assets/tweak-prompt.gif
deleted file mode 100644
index 55ba4cf..0000000
Binary files a/examples/ToolOCR/assets/tweak-prompt.gif and /dev/null differ
diff --git a/examples/ToolOCR/assets/tweak-prompt.mp4 b/examples/ToolOCR/assets/tweak-prompt.mp4
new file mode 100644
index 0000000..2c68b78
Binary files /dev/null and b/examples/ToolOCR/assets/tweak-prompt.mp4 differ
diff --git a/examples/ToolRAG/ToolRAG.mdx b/examples/ToolRAG/ToolRAG.mdx
index 1731fab..6628849 100644
--- a/examples/ToolRAG/ToolRAG.mdx
+++ b/examples/ToolRAG/ToolRAG.mdx
@@ -7,7 +7,7 @@ to compare multiple versions side-by-side, you'd have to deploy each version sep
 Using Braintrust, you can experiment with different
 prompts together with retrieval logic, side-by-side, all within the playground UI. In this cookbook, we'll walk through exactly how.
 
-![Side-by-side](./assets/Side-by-side.gif)
+![Side-by-side](./assets/Side-by-side.mp4)
 
 ## Architecture
 
@@ -117,7 +117,7 @@ The output should be:
 
 To try out the tool, visit the project in Braintrust, and navigate to **Tools**.
 
-![Test tool](./assets/Test-tool.gif)
+![Test tool](./assets/Test-tool.mp4)
 
 Here, you can test different searches and refine the logic. For example, you could try playing with various
 `top_k` values, or adding a prefix to the query to guide the results. If you change the code, run
@@ -150,7 +150,7 @@ npx braintrust push prompt.ts
 
 Once the prompt uploads, you can run it in the UI and even try it out on some examples:
 
-![Test prompt](./assets/Test-prompt.gif)
+![Test prompt](./assets/Test-prompt.mp4)
 
 If you visit the **Logs** tab, you can check out detailed logs for each call:
 
@@ -181,12 +181,12 @@ Once you create it, if you visit the **Datasets** tab, you'll be able to explore
 
 To try out the prompt together with the dataset, we'll create a playground.
 
-![Create playground](./assets/Create-playground.gif)
+![Create playground](./assets/Create-playground.mp4)
 
 Once you create the playground, hit **Run** to run the prompt and tool on the questions
 in the dataset.
 
-![Run playground](./assets/Run-playground.gif)
+![Run playground](./assets/Run-playground.mp4)
 
 ### Define a scorer
 
@@ -228,7 +228,7 @@ Once you define the scorer, hit **Run** to run it on the questions in the datase
 Now, let's tweak the prompt to see if we can improve the results. Hit the copy icon to duplicate your prompt and start tweaking. You can also tweak the original prompt and save your changes there if you'd like. For example, you can try instructing the model to always include a Python and
 TypeScript code snippet.
 
-![Tweak prompt](./assets/Tweak-prompt.gif)
+![Tweak prompt](./assets/Tweak-prompt.mp4)
 
 Once you're satisfied with the prompt, hit **Update** to save the changes. Each time you save the prompt, you
 create a new version. To learn more about how to use a prompt in your code, check out the
diff --git a/examples/ToolRAG/assets/Create-playground.gif b/examples/ToolRAG/assets/Create-playground.gif
deleted file mode 100644
index c12c6b1..0000000
Binary files a/examples/ToolRAG/assets/Create-playground.gif and /dev/null differ
diff --git a/examples/ToolRAG/assets/Create-playground.mp4 b/examples/ToolRAG/assets/Create-playground.mp4
new file mode 100644
index 0000000..43d4af0
Binary files /dev/null and b/examples/ToolRAG/assets/Create-playground.mp4 differ
diff --git a/examples/ToolRAG/assets/Run-playground.gif b/examples/ToolRAG/assets/Run-playground.gif
deleted file mode 100644
index 494ee17..0000000
Binary files a/examples/ToolRAG/assets/Run-playground.gif and /dev/null differ
diff --git a/examples/ToolRAG/assets/Run-playground.mp4 b/examples/ToolRAG/assets/Run-playground.mp4
new file mode 100644
index 0000000..b1d3280
Binary files /dev/null and b/examples/ToolRAG/assets/Run-playground.mp4 differ
diff --git a/examples/ToolRAG/assets/Side-by-side.gif b/examples/ToolRAG/assets/Side-by-side.gif
deleted file mode 100644
index 5c333a2..0000000
Binary files a/examples/ToolRAG/assets/Side-by-side.gif and /dev/null differ
diff --git a/examples/ToolRAG/assets/Side-by-side.mp4 b/examples/ToolRAG/assets/Side-by-side.mp4
new file mode 100644
index 0000000..bc732d6
Binary files /dev/null and b/examples/ToolRAG/assets/Side-by-side.mp4 differ
diff --git a/examples/ToolRAG/assets/Test-prompt.gif b/examples/ToolRAG/assets/Test-prompt.gif
deleted file mode 100644
index 7ed7f95..0000000
Binary files a/examples/ToolRAG/assets/Test-prompt.gif and /dev/null differ
diff --git a/examples/ToolRAG/assets/Test-prompt.mp4 b/examples/ToolRAG/assets/Test-prompt.mp4
new file mode 100644
index 0000000..53f26e2
Binary files /dev/null and b/examples/ToolRAG/assets/Test-prompt.mp4 differ
diff --git a/examples/ToolRAG/assets/Test-tool.gif b/examples/ToolRAG/assets/Test-tool.gif
deleted file mode 100644
index 70b01bb..0000000
Binary files a/examples/ToolRAG/assets/Test-tool.gif and /dev/null differ
diff --git a/examples/ToolRAG/assets/Test-tool.mp4 b/examples/ToolRAG/assets/Test-tool.mp4
new file mode 100644
index 0000000..387f22c
Binary files /dev/null and b/examples/ToolRAG/assets/Test-tool.mp4 differ
diff --git a/examples/ToolRAG/assets/Tweak-prompt.gif b/examples/ToolRAG/assets/Tweak-prompt.gif
deleted file mode 100644
index fe25c28..0000000
Binary files a/examples/ToolRAG/assets/Tweak-prompt.gif and /dev/null differ
diff --git a/examples/ToolRAG/assets/Tweak-prompt.mp4 b/examples/ToolRAG/assets/Tweak-prompt.mp4
new file mode 100644
index 0000000..1b1696d
Binary files /dev/null and b/examples/ToolRAG/assets/Tweak-prompt.mp4 differ
diff --git a/examples/ToolRAG/tool-rag/docs-sample/APIAgent-Py.mdx b/examples/ToolRAG/tool-rag/docs-sample/APIAgent-Py.mdx
index ec7ad13..1252fca 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/APIAgent-Py.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/APIAgent-Py.mdx
@@ -545,7 +545,7 @@ Question: How do I purchase GPUs through Braintrust?
 
 Awesome! The logs now have a `no_hallucination` score which we can use to filter down hallucinations.
 
-![Hallucination logs](./../assets/APIAgent-Py/logs-with-score.gif)
+![Hallucination logs](./../assets/APIAgent-Py/logs-with-score.mp4)
 
 ### Creating datasets
 
@@ -553,7 +553,7 @@ Let's create two datasets: one for good answers and the other for hallucinations
 non-hallucinations are correct, but in a real-world scenario, you could [collect user feedback](https://www.braintrust.dev/docs/guides/logging#user-feedback)
 and treat positively rated feedback as ground truth.
 
-![Dataset setup](./../assets/APIAgent-Py/dataset-setup.gif)
+![Dataset setup](./../assets/APIAgent-Py/dataset-setup.mp4)
 
 ## Running evals
 
@@ -680,7 +680,7 @@ Awesome! Looks like we were able to solve the hallucinations, although we may ha
 
 To understand why, we can filter down to this regression, and take a look at a side-by-side diff.
 
-![Regression diff](./../assets/APIAgent-Py/regression-diff.gif)
+![Regression diff](./../assets/APIAgent-Py/regression-diff.mp4)
 
 Does it matter whether or not the model generates these fields? That's a good question and something you can work on as a next step.
 Maybe you should tweak how Factuality works, or change the prompt to always return a consistent set of fields.
diff --git a/examples/ToolRAG/tool-rag/docs-sample/Github-Issues.mdx b/examples/ToolRAG/tool-rag/docs-sample/Github-Issues.mdx
index 9be6106..11c9ea1 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/Github-Issues.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/Github-Issues.mdx
@@ -390,4 +390,4 @@ them into your evals.
 
 Happy evaluating!
 
-![improvements](./../assets/Github-Issues/improvements.gif)
+![improvements](./../assets/Github-Issues/improvements.mp4)
diff --git a/examples/ToolRAG/tool-rag/docs-sample/LLaMa-3_1-Tools.mdx b/examples/ToolRAG/tool-rag/docs-sample/LLaMa-3_1-Tools.mdx
index d430e96..652b905 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/LLaMa-3_1-Tools.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/LLaMa-3_1-Tools.mdx
@@ -605,7 +605,7 @@ Ok, let's dig into the results. To start, we'll look at how LLaMa-3.1-8B compare
 
 Although it's a fraction of the cost, it's both slower (likely due to rate limits) and worse performing than GPT-4o. 12 of the 60 cases failed to parse. Let's take a look at one of those in depth.
 
-![parsing-failure](./../assets/LLaMa-3_1-Tools/parsing-failure.gif)
+![parsing-failure](./../assets/LLaMa-3_1-Tools/parsing-failure.mp4)
 
 That definitely looks like an invalid tool call. Maybe we can experiment with tweaking the prompt to get better results.
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/ProviderBenchmark.mdx b/examples/ToolRAG/tool-rag/docs-sample/ProviderBenchmark.mdx
index 842c7ae..792c920 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/ProviderBenchmark.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/ProviderBenchmark.mdx
@@ -381,7 +381,7 @@ await Promise.all(providers.map(runProviderBenchmark));
 
 Let's start by looking at the project view. Braintrust makes it easy to morph this into a multi-level grouped analysis where we can see the score vs. duration in a scatter plot, and how each provider stacks up in the table.
 
-![Setting up the table](./../assets/ProviderBenchmark/configuring-graph.gif)
+![Setting up the table](./../assets/ProviderBenchmark/configuring-graph.mp4)
 
 ### Insights
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/ReceiptExtraction.mdx b/examples/ToolRAG/tool-rag/docs-sample/ReceiptExtraction.mdx
index 6dfb6b2..9e5a068 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/ReceiptExtraction.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/ReceiptExtraction.mdx
@@ -325,7 +325,7 @@ Let's dig into these individual results in some more depth.
 
 If you click into the gpt-4o experiment and compare it to gpt-4o-mini, you can drill down into the individual improvements and regressions.
 
-![Regressions](./../assets/ReceiptExtraction/GPT-4o-vs-4o-mini.gif)
+![Regressions](./../assets/ReceiptExtraction/GPT-4o-vs-4o-mini.mp4)
 
 There are several different types of regressions, one of which appears to be that `gpt-4o` returns information in a different case than `gpt-4o-mini`. That may or
 may not be important for this use case, but if not, we could adjust our scoring functions to lowercase everything before comparing.
diff --git a/examples/ToolRAG/tool-rag/docs-sample/SimpleRagas.mdx b/examples/ToolRAG/tool-rag/docs-sample/SimpleRagas.mdx
index 632dfe9..2aabe67 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/SimpleRagas.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/SimpleRagas.mdx
@@ -339,7 +339,7 @@ By default, Ragas is configured to use `gpt-3.5-turbo-16k`. As we observed, it l
 results, and maybe we should try using `gpt-4` instead. Braintrust lets us test the effect of this quickly, directly in the UI, before we run
 a full experiment:
 
-![try gpt-4](./../assets/SimpleRagas/try-gpt-4.gif)
+![try gpt-4](./../assets/SimpleRagas/try-gpt-4.mp4)
 
 Looks better. Let's update our scoring function to use it and re-run the experiment.
 
@@ -427,6 +427,6 @@ Although not a pure fail, it does seem like in 3 cases we're not retrieving the
 We can drill down on individual examples of each regression type to better understand it. The side-by-side diffs built into Braintrust make
 it easy to deeply understand every step of the pipeline, for example, which documents were missing, and why.
 
-![missing docs](./../assets/SimpleRagas/missing-docs.gif)
+![missing docs](./../assets/SimpleRagas/missing-docs.mp4)
 
 And there you have it! Ragas is a powerful technique, that with the right tools and iteration can lead to really high quality RAG applications. Happy evaling!
diff --git a/examples/ToolRAG/tool-rag/docs-sample/Text2SQL-Data.mdx b/examples/ToolRAG/tool-rag/docs-sample/Text2SQL-Data.mdx
index dcb6089..8d58255 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/Text2SQL-Data.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/Text2SQL-Data.mdx
@@ -290,7 +290,7 @@ To best utilize these results:
 1. Let's capture the good data into a dataset. Since our eval pipeline did the hard work of generating a reference query and results, we can
    now save these, and make sure that future changes we make do not _regress_ the results.
 
-![add to dataset](./../assets/Text2SQL-Data/add-to-dataset.gif)
+![add to dataset](./../assets/Text2SQL-Data/add-to-dataset.mp4)
 
 - The incorrect query didn't seem to get the date format correct. That would probably be improved by showing a sample of the data to the model.
 
@@ -685,7 +685,7 @@ Interesting. It seems like that was not a slam dunk. There were a few regression
 
 Braintrust makes it easy to filter down to the regressions, and view a side-by-side diff:
 
-![diff](./../assets/Text2SQL-Data/analyze-regressions.gif)
+![diff](./../assets/Text2SQL-Data/analyze-regressions.mp4)
 
 ## Conclusion
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/UnreleasedAI.mdx b/examples/ToolRAG/tool-rag/docs-sample/UnreleasedAI.mdx
index 8ad4a9a..30e3804 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/UnreleasedAI.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/UnreleasedAI.mdx
@@ -163,7 +163,7 @@ Now, let’s use the comprehensiveness scorer to create a feedback loop that all
 Go to your Braintrust **Logs** and select one of your logs. In the expanded view on the left-hand side of your screen, select the **generate-changelog** span, then select **Add to dataset**. Create a new dataset called `eval dataset`, and add a couple more logs to the same dataset. We'll use this dataset to run an experiment that evaluates for comprehensiveness to understand where the prompt might need adjustments.
 
 <video
-  src="../assets/UnreleasedAI/add-logs-to-dataset.mp4"
+  src="/docs/cookbook/assets/UnreleasedAI/add-logs-to-dataset.mp4"
   autoplay
   loop
   muted
diff --git a/examples/ToolRAG/tool-rag/docs-sample/api.mdx b/examples/ToolRAG/tool-rag/docs-sample/api.mdx
index 121bc0a..66e800b 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/api.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/api.mdx
@@ -414,7 +414,7 @@ load Braintrust's API spec into Postman by simply importing the OpenAPI spec's U
 https://raw.githubusercontent.com/braintrustdata/braintrust-openapi/main/openapi/spec.json
 ```
 
-![Postman](./api/postman.gif)
+![Postman](./api/postman.mp4)
 
 ## Tracing with the REST API SDKs
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/datasets.mdx b/examples/ToolRAG/tool-rag/docs-sample/datasets.mdx
index 25db006..579c624 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/datasets.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/datasets.mdx
@@ -158,13 +158,13 @@ anomalous examples into a golden dataset.
 
 The easiest way to create a dataset is to upload a CSV file.
 
-![Upload CSV](./datasets/CSV-Upload.gif)
+![Upload CSV](./datasets/CSV-Upload.mp4)
 
 ### Updating records
 
 Once you've uploaded a dataset, you can update records or add new ones directly in the UI.
 
-![Edit record](./datasets/Edit-record.gif)
+![Edit record](./datasets/Edit-record.mp4)
 
 ### Labeling records
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/human-review.mdx b/examples/ToolRAG/tool-rag/docs-sample/human-review.mdx
index 23fa88d..19d355f 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/human-review.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/human-review.mdx
@@ -13,7 +13,7 @@ feedback from end users, subject matter experts, and product teams in one place.
 use human review to evaluate/compare experiments, assess the efficacy of your automated scoring
 methods, and curate log events to use in your evals.
 
-![Human review label](./human-review/label.gif)
+![Human review label](./human-review/label.mp4)
 
 ## Configuring human review
 
@@ -51,7 +51,7 @@ an option to select multiple values when writing to the expected field.
 To manually review results in your logs or an experiment, simply click on a row, and you'll see
 the human review scores you configured in the expanded trace view.
 
-![Set score](./human-review/in-experiment.gif)
+![Set score](./human-review/in-experiment.mp4)
 
 As you set scores, they will be automatically saved and reflected in the summary metrics. The exact same
 mechanism works whether you're reviewing logs or experiments
@@ -61,7 +61,7 @@ mechanism works whether you're reviewing logs or experiments
 In addition to setting scores, you can also add comments to spans and update their `expected` values. These updates
 are tracked alongside score updates to form an audit trail of edits to a span.
 
-![Save comment](./human-review/comment.gif)
+![Save comment](./human-review/comment.mp4)
 
 ## Rapid review mode
 
@@ -69,7 +69,7 @@ If you or a subject matter expert is reviewing a large number of logs, you can u
 a UI that's optimized specifically for review. To enter review mode, hit the "r" key or the expand (<Maximize2 className="size-3 inline" />)
 icon next to the "Human review" header.
 
-![Review mode](./human-review/review-mode.gif)
+![Review mode](./human-review/review-mode.mp4)
 
 In review mode, you can set scores, leave comments, and edit expected values. Review mode is optimized for keyboard
 navigation, so you can quickly move between scores and rows with keyboard shortcuts. You can also share a link to the
diff --git a/examples/ToolRAG/tool-rag/docs-sample/logging.mdx b/examples/ToolRAG/tool-rag/docs-sample/logging.mdx
index baf0df6..f0b6c89 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/logging.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/logging.mdx
@@ -439,7 +439,7 @@ def my_feedback_handler(req):
 Braintrust supports curating logs by adding tags, and then filtering on them in the UI. Tags naturally flow between logs, to datasets, and even
 to experiments, so you can use them to track various kinds of data across your application, and track how they change over time.
 
-![Add tags](./logging/Add-Tag.gif)
+![Add tags](./logging/Add-Tag.mp4)
 
 ### Configuring tags
 
@@ -575,7 +575,7 @@ def my_feedback_handler(req):
 
 To filter by tags, simply select the tags you want to filter by in the UI.
 
-![Filter by tags](./logging/Filter-Tag.gif)
+![Filter by tags](./logging/Filter-Tag.mp4)
 
 ### Using tags to create queues
 
@@ -586,7 +586,7 @@ and one to indicate that the event is no longer in the queue. For example, you m
 As you're reviewing logs, simply add the `triage` tag to the logs you want to review later. To see the logs in the queue, filter by the
 `triage` tag. You can add an additional label, like `NOT (tags includes 'triaged')` to exclude logs that have been marked as done.
 
-![Triaged](./logging/Triage.gif)
+![Triaged](./logging/Triage.mp4)
 
 ## Online evaluation
 
@@ -599,7 +599,7 @@ a sampling rate along with more granular filters to control which logs get evalu
 
 To create an online evaluation, navigate to the "Configuration" tab in a project and create an online scoring rule.
 
-![Create online evaluation](./logging/Online-Scoring-Setup.gif)
+![Create online evaluation](./logging/Online-Scoring-Setup.mp4)
 
 The score will now automatically run at the specified sampling rate for all logs in the project.
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/playground.mdx b/examples/ToolRAG/tool-rag/docs-sample/playground.mdx
index 28b2b63..5236ef9 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/playground.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/playground.mdx
@@ -29,7 +29,7 @@ that includes one or more prompts and is linked to a dataset.
 
 Playgrounds are designed for collaboration and automatically synchronize in real-time.
 
-![Sync Playground](/docs/guides/playground/sync-playground.gif)
+![Sync Playground](/docs/guides/playground/sync-playground.mp4)
 
 To share a playground, simply copy the URL and send it to your collaborators. Your collaborators
 must be members of your organization to see the session. You can invite users from the <Link href="/app/settings?subroute=team" target="_blank">settings</Link> page.
diff --git a/examples/ToolRAG/tool-rag/docs-sample/prompts.mdx b/examples/ToolRAG/tool-rag/docs-sample/prompts.mdx
index 36554f7..102d904 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/prompts.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/prompts.mdx
@@ -19,7 +19,7 @@ To create a prompt, visit the prompts tab in a project, and click the "+ Prompt"
 for your prompt. The slug is an immutable identifier that you can use to reference it in your code. As you change
 the prompt's name, description, or contents, its slug stays constant.
 
-![Create a prompt](./prompts/create.gif)
+![Create a prompt](./prompts/create.mp4)
 
 Prompts can use [mustache](https://mustache.github.io/mustache.5.html) templating syntax to refer to variables. These variables are substituted
 automatically in the API, playground, and using the `.build()` function in your code. More on that below.
@@ -29,7 +29,7 @@ automatically in the API, playground, and using the `.build()` function in your
 Each prompt change is versioned, e.g. `5878bd218351fb8e`. You can use this identifier to pin a specific
 version of the prompt in your code.
 
-![Update a prompt](./prompts/update.gif)
+![Update a prompt](./prompts/update.mp4)
 
 You can use this identifier to refer to a specific version of the prompt in your code.
 
@@ -38,7 +38,7 @@ You can use this identifier to refer to a specific version of the prompt in your
 While developing a prompt, it can be useful to test it out on real-world data in the [Playground](/docs/guides/playground).
 You can open a prompt in the playground, tweak it, and save a new version once you're ready.
 
-![Playground](./prompts/playground.gif)
+![Playground](./prompts/playground.mp4)
 
 ## Using tools
 
@@ -115,19 +115,19 @@ To use a tool, simply select it in the "Tools" dropdown. Braintrust will automat
 - Call the model again with the tool's result as context
 - Continue for up to (default) 5 iterations or until the model produces a non-tool result
 
-![Invoke github tool](./prompts/invoke-github-tool.gif)
+![Invoke github tool](./prompts/invoke-github-tool.mp4)
 
 ### Coercing a model's output schema
 
 To define a set of tools available to a model, expand the "Tools" dropdown and select the Raw tab. You can enter an array of tool definitions,
 following the [OpenAI tool format](https://platform.openai.com/docs/guides/function-calling).
 
-![Raw tools](./prompts/raw-tools.gif)
+![Raw tools](./prompts/raw-tools.mp4)
 
 By default, if a tool is called, Braintrust will return the arguments of the first tool call as a JSON object. If you use the [`invoke` API](#executing-directly),
 you'll receive a JSON object as the result.
 
-![Invoke raw tool](./prompts/invoke-raw-tools.gif)
+![Invoke raw tool](./prompts/invoke-raw-tools.mp4)
 
 <Callout type="info">
   If you specify `parallel` as the mode, then instead of the first tool call's
@@ -641,7 +641,7 @@ When you use a prompt in your code, Braintrust automatically links spans to the
 you to click to open a span in the playground, and see the prompt that generated it alongside the input variables. You can
 even test and save a new version of the prompt directly from the playground.
 
-![Open from traces](./prompts/debug.gif)
+![Open from traces](./prompts/debug.mp4)
 
 This workflow is very powerful. It effectively allows you to debug, iterate, and publish changes to your prompts directly
 within Braintrust. And because Braintrust flexibly allows you to load the latest prompt, a specific version, or even a version
diff --git a/examples/ToolRAG/tool-rag/docs-sample/proxy.mdx b/examples/ToolRAG/tool-rag/docs-sample/proxy.mdx
index a6446bc..010a4b2 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/proxy.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/proxy.mdx
@@ -329,7 +329,7 @@ resiliency in case one provider is down.
 You can setup endpoints directly on the [secrets page](/app/settings?subroute=secrets) in your Braintrust account
 by adding endpoints:
 
-![Configure secrets](/blog/img/secrets-endpoint-config.gif)
+![Configure secrets](/blog/img/secrets-endpoint-config.mp4)
 
 ## Advanced configuration
 
diff --git a/examples/ToolRAG/tool-rag/docs-sample/tracing.mdx b/examples/ToolRAG/tool-rag/docs-sample/tracing.mdx
index bd4894c..70a4676 100644
--- a/examples/ToolRAG/tool-rag/docs-sample/tracing.mdx
+++ b/examples/ToolRAG/tool-rag/docs-sample/tracing.mdx
@@ -206,7 +206,7 @@ def my_route_handler(req):
 </PYTab>
 </CodeTabs>
 
-![Logging Result](/docs/assets/OpenAITrace.gif)
+![Logging Result](/docs/assets/OpenAITrace.mp4)
 
 <Callout type="info">
   When using `wrapOpenAI`/`wrap_openai`, you technically do not need to use
diff --git a/examples/UnreleasedAI/UnreleasedAI.mdx b/examples/UnreleasedAI/UnreleasedAI.mdx
index 7b033ce..1eaac54 100644
--- a/examples/UnreleasedAI/UnreleasedAI.mdx
+++ b/examples/UnreleasedAI/UnreleasedAI.mdx
@@ -126,7 +126,13 @@ Now, let’s use the comprehensiveness scorer to create a feedback loop that all
 
 Go to your Braintrust **Logs** and select one of your logs. In the expanded view on the left-hand side of your screen, select the **generate-changelog** span, then select **Add to dataset**. Create a new dataset called `eval dataset`, and add a couple more logs to the same dataset. We'll use this dataset to run an experiment that evaluates for comprehensiveness to understand where the prompt might need adjustments.
 
-<video src="assets/add-logs-to-dataset.mp4" autoplay loop muted></video>
+<video
+  src="/docs/cookbook/assets/UnreleasedAI/add-logs-to-dataset.mp4"
+  autoplay
+  loop
+  muted
+  playsInline
+/>
 
 Alternatively, you can define a dataset in [eval/sampleData.ts](https://github.com/braintrustdata/unreleased-ai/blob/main/eval/sampleData.ts).
 
diff --git a/examples/UnreleasedAI/assets/add-logs-to-dataset.gif b/examples/UnreleasedAI/assets/add-logs-to-dataset.gif
deleted file mode 100644
index bb97027..0000000
Binary files a/examples/UnreleasedAI/assets/add-logs-to-dataset.gif and /dev/null differ
diff --git a/examples/VercelAISDKTracing/assets/traced.gif b/examples/VercelAISDKTracing/assets/traced.gif
deleted file mode 100644
index 68880c7..0000000
Binary files a/examples/VercelAISDKTracing/assets/traced.gif and /dev/null differ
diff --git a/examples/VercelAISDKTracing/assets/traced.mp4 b/examples/VercelAISDKTracing/assets/traced.mp4
new file mode 100644
index 0000000..9d11773
Binary files /dev/null and b/examples/VercelAISDKTracing/assets/traced.mp4 differ
diff --git a/examples/VercelAISDKTracing/assets/wrapAISDKModel.gif b/examples/VercelAISDKTracing/assets/wrapAISDKModel.gif
deleted file mode 100644
index 7660f4e..0000000
Binary files a/examples/VercelAISDKTracing/assets/wrapAISDKModel.gif and /dev/null differ
diff --git a/examples/VercelAISDKTracing/assets/wrapAISDKModel.mp4 b/examples/VercelAISDKTracing/assets/wrapAISDKModel.mp4
new file mode 100644
index 0000000..434d130
Binary files /dev/null and b/examples/VercelAISDKTracing/assets/wrapAISDKModel.mp4 differ
diff --git a/examples/VercelAISDKTracing/assets/wrapTraced.gif b/examples/VercelAISDKTracing/assets/wrapTraced.gif
deleted file mode 100644
index 492353a..0000000
Binary files a/examples/VercelAISDKTracing/assets/wrapTraced.gif and /dev/null differ
diff --git a/examples/VercelAISDKTracing/assets/wrapTraced.mp4 b/examples/VercelAISDKTracing/assets/wrapTraced.mp4
new file mode 100644
index 0000000..c8519be
Binary files /dev/null and b/examples/VercelAISDKTracing/assets/wrapTraced.mp4 differ
diff --git a/examples/VercelAISDKTracing/vercel-ai-sdk-tracing.mdx b/examples/VercelAISDKTracing/vercel-ai-sdk-tracing.mdx
index 4d86737..511531e 100644
--- a/examples/VercelAISDKTracing/vercel-ai-sdk-tracing.mdx
+++ b/examples/VercelAISDKTracing/vercel-ai-sdk-tracing.mdx
@@ -94,7 +94,7 @@ const model = wrapAISDKModel(openai("gpt-4o"));
 
 When we use the chatbot again, we see three logs appear in Braintrust: one log for the `getWeather` tool call, one log for the `getFahrenheit` tool call, and one call to form the final response. However, it'd probably be more useful to have all of these operations in the same log.
 
-![using wrapAISDKModel](./assets/wrapAISDKModel.gif)
+![using wrapAISDKModel](./assets/wrapAISDKModel.mp4)
 
 ### Creating spans (and sub-spans)
 
@@ -147,7 +147,7 @@ export async function POST(request: Request) {
 
 After you uncomment those lines of code, you should see the following:
 
-![using trace to create spans](./assets/traced.gif)
+![using trace to create spans](./assets/traced.mp4)
 
 A couple of things happened in this step:
 
@@ -239,7 +239,7 @@ export const getWeather = tool({
 
 After we finish uncommenting the correct lines, we see how the `wrapTraced` function enriches our trace with tool calls.
 
-![using wrapTraced](./assets/wrapTraced.gif)
+![using wrapTraced](./assets/wrapTraced.mp4)
 
 Take note of how the `type` argument in both `traced` and `wrapTraced` change the icon within the trace tree. Also, since `checkFreezing` was called by `weatherFunction`, the trace preserves the hierarchy.
 
diff --git a/examples/VideoQA/VideoQA.ipynb b/examples/VideoQA/VideoQA.ipynb
index 667ca2d..8f47afc 100644
--- a/examples/VideoQA/VideoQA.ipynb
+++ b/examples/VideoQA/VideoQA.ipynb
@@ -221,7 +221,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "![attachments](./assets/attachments.gif)"
+    "![attachments](./assets/attachments.mp4)"
    ]
   },
   {
@@ -365,7 +365,7 @@
     "\n",
     "After running the evaluation, head over to **Evaluations** in the Braintrust UI to see your results. Select your most recent experiment to review the video frames included in the prompt, the model's answer for each sample, and the scoring by our LLM-based judge. We also attached metadata like `subject` and `question_type`, which you can use to filter in the Braintrust UI. This makes it easy to see whether the model underperforms on a certain type of question or domain. If you discover specific weaknesses, consider refining your prompt with more context or switching models.\n",
     "\n",
-    "![Filtering](./assets/filters.gif)"
+    "![Filtering](./assets/filters.mp4)"
    ]
   },
   {
diff --git a/examples/VideoQA/assets/attachments.gif b/examples/VideoQA/assets/attachments.gif
deleted file mode 100644
index b47a4af..0000000
Binary files a/examples/VideoQA/assets/attachments.gif and /dev/null differ
diff --git a/examples/VideoQA/assets/attachments.mp4 b/examples/VideoQA/assets/attachments.mp4
new file mode 100644
index 0000000..4909ca3
Binary files /dev/null and b/examples/VideoQA/assets/attachments.mp4 differ
diff --git a/examples/VideoQA/assets/filters.gif b/examples/VideoQA/assets/filters.gif
deleted file mode 100644
index 042743e..0000000
Binary files a/examples/VideoQA/assets/filters.gif and /dev/null differ
diff --git a/examples/VideoQA/assets/filters.mp4 b/examples/VideoQA/assets/filters.mp4
new file mode 100644
index 0000000..3c7e939
Binary files /dev/null and b/examples/VideoQA/assets/filters.mp4 differ
diff --git a/examples/VideoQATwelveLabs/VideoQATwelveLabs.ipynb b/examples/VideoQATwelveLabs/VideoQATwelveLabs.ipynb
index d29da2b..b55e235 100644
--- a/examples/VideoQATwelveLabs/VideoQATwelveLabs.ipynb
+++ b/examples/VideoQATwelveLabs/VideoQATwelveLabs.ipynb
@@ -402,7 +402,7 @@
       "source": [
         "After you run the evaluation, you'll be able to investigate each video as an attachment in Braintrust, so you can dig into any cases that may need attention during evaluation. \n",
         "\n",
-        "![View attachment](./assets/view-attachment.gif)"
+        "![View attachment](./assets/view-attachment.mp4)"
       ]
     },
     {
diff --git a/examples/VideoQATwelveLabs/assets/view-attachment.gif b/examples/VideoQATwelveLabs/assets/view-attachment.gif
deleted file mode 100644
index 8b39fcb..0000000
Binary files a/examples/VideoQATwelveLabs/assets/view-attachment.gif and /dev/null differ
diff --git a/examples/VideoQATwelveLabs/assets/view-attachment.mp4 b/examples/VideoQATwelveLabs/assets/view-attachment.mp4
new file mode 100644
index 0000000..69b89d4
Binary files /dev/null and b/examples/VideoQATwelveLabs/assets/view-attachment.mp4 differ