# Voxtral Transcribe

**Thoughts move fast. Your transcription should keep up.**

Voxtral Transcribe streams text into your notes as you speak. Add structure by voice (headings, bullets, to-dos), or grab the keyboard mid-dictation — the mic waits for you and resumes when you stop typing. Edits happen as you go, not after.

Dictate directly into Markdown using [Mistral's Voxtral](https://mistral.ai/) speech-to-text. Insert headings, lists, to-dos and other elements by voice, correct text inline or on the fly, use real-time streaming or batch tap-to-send. Supports transcription in 13+ languages.

### Get going in under a minute
1. Install and paste your [Mistral API key](https://console.mistral.ai/)
2. Press `Ctrl+Space` (desktop) or tap the mic icon (mobile)
3. Start talking — say *"heading 2"*, *"new bullet"*, *"for the correction: ..."* as you go
4. Like it? [☕ Buy Me a Coffee](https://buymeacoffee.com/maxonamission)

## Why Voxtral?

Voxtral is purpose-built for transcription, not retrofitted from a general audio model. Three things that matter for dictation:

- **Low word error rate on hard audio** — handles background noise, accents, and technical jargon well, including on continuous speech
- **Streaming-first** — designed for low-latency partial results, which is what makes "text appears as you speak" feel real-time instead of stuttery
- **Multilingual by design** — 13+ languages with consistent quality, not English-first with the rest bolted on

If you're choosing between speech-to-text models for dictation specifically (rather than, say, post-hoc transcription of meeting recordings), this is a strong fit.

## Features

- **Real-time streaming** (desktop) — text appears as you speak
- **Batch mode with tap-to-send** (desktop + mobile) — send audio chunks while you keep talking
- **Voice commands** — insert headings, bullet points, to-do items, numbered lists, and more by voice
- **13 languages** — voice commands automatically adapt to the selected language; English always works as fallback (Dutch, English, French, German, Spanish, Portuguese, Italian, Russian, Chinese, Hindi, Arabic, Japanese, Korean)
- **Voice command help panel** — shows available commands and trigger phrases for the active language
- **Auto-correction** — spelling, capitalization, and punctuation are automatically corrected after recording
- **Inline correction instructions** — say "for the correction: ..." and the corrector will follow your instructions
- **Self-correction recognition** — "no not X but Y" is handled automatically
- **Mishearing correction** — common speech recognition errors are fixed automatically per language
- **Microphone selection** — choose which microphone to use
- **Auto-pause on focus loss** — configurable behavior when switching apps on mobile
- **Configurable Enter-to-send** — optionally use Enter as tap-to-send when the mic is live (batch mode)
- **Typing cooldown** — adjustable delay before the mic resumes after typing

Need coffee to process all this? Me too. [☕ Buy Me a Coffee](https://buymeacoffee.com/maxonamission)

## Requirements

- **Obsidian** v1.0.0 or newer
- **Mistral API key** — free to create at [console.mistral.ai](https://console.mistral.ai/)

## Installation

### From Community Plugins (recommended)

1. Open **Settings** > **Community plugins** > **Browse**
2. Search for "Voxtral Transcribe"
3. Click **Install**, then **Enable**
4. Go to **Settings** > **Voxtral Transcribe** and enter your Mistral API key

### Manual installation

1. Download `main.js`, `manifest.json`, and `styles.css` from the [latest release](https://github.com/maxonamission/obsidian-voxtral/releases/latest)
2. Create a folder `.obsidian/plugins/voxtral-transcribe/` in your vault
3. Copy the three files into that folder
4. Restart Obsidian and enable the plugin in **Settings** > **Community plugins**

## Usage

### Desktop (real-time mode)

1. Open a note
2. Click the microphone icon in the ribbon, or press **Ctrl+Space**
3. Start speaking — text appears live in your note
4. Click the microphone again or say **"stop recording"** to stop
5. Auto-correction runs automatically if enabled

### Mobile (batch mode)

On mobile, only batch mode is available (real-time streaming requires Node.js).

1. Open a note
2. Tap the microphone icon to start recording
3. Tap the **send icon** in the view header to transcribe the current audio chunk — the recording keeps going
4. On desktop, press **Enter** while the mic is live (not typing) to send a chunk (if *Enter = tap-to-send* is enabled)
5. Keep talking and tap/press send again for the next chunk
6. Tap the microphone to stop — the last chunk is processed automatically

### Voice commands

Voice commands are recognized at the end of a sentence. Commands automatically adapt to the language selected in settings — the table below shows examples in English, but equivalent phrases are available in all 13 supported languages. Open the **Voice Commands** help panel (ribbon icon or command palette) to see the exact phrases for your active language.

| Command | Example (English) | Result |
|---|---|---|
| New paragraph | "new paragraph" | Double line break |
| New line | "new line" | Single line break |
| Heading 1–3 | "heading 1" / "heading 2" / "heading 3" | `# ` / `## ` / `### ` |
| Bullet point | "bullet point" | `- ` |
| To-do item | "new todo" | `- [ ] ` |
| Numbered item | "numbered item" | `1. ` (auto-increments) |
| Delete last paragraph | "delete last paragraph" | Removes last paragraph |
| Delete last line | "delete last line" | Removes last sentence |
| Undo | "undo" | Undo last action |
| Stop recording | "stop recording" | Stops the recording |

### Text correction

- **Correct selection**: Select text > Command palette > "Correct selected text"
- **Correct entire note**: Command palette > "Correct entire note"

### Focus loss behavior

When switching apps on mobile, you can configure what happens to an active recording:

- **Pause immediately** (default) — pauses and resumes when you return
- **Pause after delay** — keeps recording for a configurable time (10s–5min), then pauses
- **Keep recording** — continues recording in the background

## Settings

| Setting | Description |
|---|---|
| Mistral API key | Your API key from console.mistral.ai |
| Microphone | Which microphone to use |
| Mode | Realtime (desktop only) or Batch |
| Enter = tap-to-send | Use Enter to send audio chunks when mic is live (batch mode, default: on) |
| Typing cooldown | Delay before mic resumes after typing (default: 800 ms) |
| On focus loss | Pause immediately / after delay / keep recording |
| Language | Language for transcription and voice commands (13 languages, default: Nederlands) |
| Auto-correct | Enable/disable automatic correction |
| Streaming delay | Latency vs accuracy tradeoff for realtime mode |

## Development

```bash
npm install
npm run dev    # watch mode
npm run build  # production build
```

## License

[GPL-3.0](LICENSE) — Copyright (c) 2026 Max Kloosterman