About the product

A voice layer for writing, translation, and app-aware output on macOS.

Voxt is a macOS menu bar voice input and translation app. The product is built around a fast hold-to-speak shortcut, live transcription cleanup, translation, rewriting, and app-specific enhancement that can adapt output to the current workspace.

Workflows

What Voxt does in practice.

The core product is intentionally small in shape but broad in workflow coverage. Each shortcut changes the result type without forcing a different app or a second capture surface.

Transcription

The default transcription path keeps a live preview while you speak, then applies punctuation, filler-word cleanup, app-specific prompts, and personal dictionary rules before paste.

Translation

Translation can run right after speech transcription or on selected text directly, with separate model choice and terminology guidance for the translation lane.

Rewrite and prompt

Rewrite mode uses voice as the instruction. It can rewrite selected text, generate fresh text, and keep an answer card visible even when no writable field is focused.

App-specific enhancement

App Branch lets Voxt apply different cleanup rules, prompts, dictionaries, and output preferences based on the current app or URL, so chat, mail, docs, and editors can each receive the right tone.

Model architecture

Local ASR

MLX Audio, Whisper, Direct Dictation

Local LLM

Qwen, GLM, Llama, Mistral, Gemma

Remote ASR

OpenAI, Doubao ASR, GLM ASR, Aliyun Bailian ASR

Remote LLM

Anthropic, Gemini, OpenAI, Ollama, OpenRouter and more

Details

The product behavior is tuned around real desktop writing.

A few product details matter because they explain why Voxt is faster than a generic push-to-talk wrapper.

App-specific behavior

App Branch lets different apps or URLs use different enhancement prompts and cleanup rules, so chat, email, coding, and research can each keep their own voice.

Dictionary-aware output

Personal dictionary support can inject exact terms into prompts and auto-correct high-confidence near matches, which is especially useful for names, products, and bilingual jargon.

One workflow, multiple engines

Each workflow can stay on the same shortcut-driven surface while still routing transcription, translation, rewriting, and enhancement through different providers when that produces better latency or quality.

Main window

The desktop app keeps permissions, models, shortcuts, and workflow settings close to the main control surface.

Voxt main window