Live Transcriptions Service

A real-time speech-to-text application powered by Mistral AI's Voxtral models. This project demonstrates how to implement "near real-time" transcription using Next.js, Client-Side VAD (Voice Activity Detection), and Mistral's transcription API.

sequenceDiagram
    participant U as User
    participant C as Client (useScribe)
    participant S as Next.js Server (/api/transcribe)
    participant M as Mistral API

    U->>C: Speaks ("Hello world")
    C->>C: Buffers Audio & Checks Volume (VAD)
    
    Note over C: User pauses for 600ms
    
    C->>C: VAD Trigger -> Encode WAV
    C->>S: POST /api/transcribe (FormData)
    S->>M: POST /v1/audio/transcriptions (Proxy)
    M-->>S: JSON Response ("Hello world")
    S-->>C: JSON Response
    C->>U: UI Update (Append Text)

Features

Live Transcription: Converts speech to text with low latency.
Smart VAD (Voice Activity Detection): Automatically detects when you stop speaking to send audio phrases, ensuring better context and accuracy.
Secure Architecture: API keys are kept safe on the server side using a proxy route.
Modern UI: Built with Next.js, Tailwind CSS, and Framer Motion for a smooth user experience.

Prerequisites

Node.js 18+
pnpm (recommended) or npm/yarn
A Mistral API Key

Getting Started

Clone the repository (if you haven't already).
Install dependencies:
```
pnpm install
```
Configure Environment: Create a .env.local file in the root directory and add your Mistral API key:
```
MISTRAL_API_KEY=your_actual_api_key_here
```
Run the Development Server:
```
pnpm dev
```
Open the App: Navigate to http://localhost:3000 in your browser.

How It Works

Audio Capture: The app captures microphone input using the Web Audio API.
VAD Processing: It analyzes the audio volume in real-time.
- If you pause for 600ms, it detects the end of a phrase.
- If you talk for 5 seconds continuously, it forces a save to keep latency low.
WAV Encoding: The raw audio buffer is converted to a WAV file on the client.
Upload: The WAV file is sent to the Next.js API route (/api/transcribe).
Transcription: The server proxies the file to Mistral's API and returns the text.

Tech Stack

Framework: Next.js 15 (App Router)
Styling: Tailwind CSS
Animations: Framer Motion
Audio: Web Audio API (ScriptProcessorNode)
AI: Mistral Voxtral

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
components		components
hooks		hooks
lib		lib
public		public
styles		styles
ui		ui
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Plan.md		Plan.md
README.md		README.md
components.json		components.json
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
revised-plan.md		revised-plan.md
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Live Transcriptions Service

Features

Prerequisites

Getting Started

How It Works

Tech Stack

About

Uh oh!

Releases

Packages

Languages

kingksjo/my-scribe

Folders and files

Latest commit

History

Repository files navigation

Live Transcriptions Service

Features

Prerequisites

Getting Started

How It Works

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages