Skip to content

kingksjo/my-scribe

Repository files navigation

Live Transcriptions Service

A real-time speech-to-text application powered by Mistral AI's Voxtral models. This project demonstrates how to implement "near real-time" transcription using Next.js, Client-Side VAD (Voice Activity Detection), and Mistral's transcription API.

sequenceDiagram
    participant U as User
    participant C as Client (useScribe)
    participant S as Next.js Server (/api/transcribe)
    participant M as Mistral API

    U->>C: Speaks ("Hello world")
    C->>C: Buffers Audio & Checks Volume (VAD)
    
    Note over C: User pauses for 600ms
    
    C->>C: VAD Trigger -> Encode WAV
    C->>S: POST /api/transcribe (FormData)
    S->>M: POST /v1/audio/transcriptions (Proxy)
    M-->>S: JSON Response ("Hello world")
    S-->>C: JSON Response
    C->>U: UI Update (Append Text)
Loading

Features

  • Live Transcription: Converts speech to text with low latency.
  • Smart VAD (Voice Activity Detection): Automatically detects when you stop speaking to send audio phrases, ensuring better context and accuracy.
  • Secure Architecture: API keys are kept safe on the server side using a proxy route.
  • Modern UI: Built with Next.js, Tailwind CSS, and Framer Motion for a smooth user experience.

Prerequisites

Getting Started

  1. Clone the repository (if you haven't already).

  2. Install dependencies:

    pnpm install
  3. Configure Environment: Create a .env.local file in the root directory and add your Mistral API key:

    MISTRAL_API_KEY=your_actual_api_key_here
  4. Run the Development Server:

    pnpm dev
  5. Open the App: Navigate to http://localhost:3000 in your browser.

How It Works

  1. Audio Capture: The app captures microphone input using the Web Audio API.
  2. VAD Processing: It analyzes the audio volume in real-time.
    • If you pause for 600ms, it detects the end of a phrase.
    • If you talk for 5 seconds continuously, it forces a save to keep latency low.
  3. WAV Encoding: The raw audio buffer is converted to a WAV file on the client.
  4. Upload: The WAV file is sent to the Next.js API route (/api/transcribe).
  5. Transcription: The server proxies the file to Mistral's API and returns the text.

Tech Stack

  • Framework: Next.js 15 (App Router)
  • Styling: Tailwind CSS
  • Animations: Framer Motion
  • Audio: Web Audio API (ScriptProcessorNode)
  • AI: Mistral Voxtral

About

Live Transcriptions using Voxtral

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published