VoiceInk Docs
Common Issues

Transcription Taking Too Long

Understand slow recordings and make VoiceInk faster.

Where Time Is Spent

A recording can spend time in several stages:

  1. Loading or warming the transcription model.
  2. Transcribing the audio.
  3. Cleaning and formatting the text.
  4. Running AI enhancement.
  5. Pasting the result or generating an Assistant response.

Use Dashboard -> Model Performance to compare model speed over the last 7 days, last 30 days, this year, or all time. Use History -> Analyze for recording-level timing.

Local Models

Local transcription is usually fastest on Apple Silicon Macs. If you use an Intel Mac, cloud transcription providers are often much faster.

To improve local speed:

  • Use Parakeet V3 or Parakeet V2 for fast local dictation.
  • Try a smaller Whisper model if a large model is slow.
  • Enable Prewarm Model in AI Models -> gear -> Transcription.
  • Keep Voice Activity Detection enabled so silence is skipped more efficiently.

Large local models may take longer the first time they are prepared or loaded.

Cloud Transcription

Cloud transcription speed depends on provider latency, audio length, and your network connection. Groq, Deepgram, Mistral, Gemini, ElevenLabs, Soniox, Speechmatics, AssemblyAI, xAI, and Cartesia are available as cloud transcription options depending on the provider capability.

See Cloud Providers.

AI Enhancement

AI enhancement adds a second model call after transcription. If the raw transcription is fast but the final output is slow, check the Mode's AI enhancement provider and model.

To reduce enhancement delay:

  • Disable AI enhancement for simple Dictation Modes.
  • Enable Skip short transcriptions in AI Models -> gear -> Enhancement.
  • Increase or decrease Timeout Duration based on your tolerance.
  • Keep Retry on timeout enabled if quality matters more than speed.
  • Try a faster enhancement provider or model.

See Model Settings and Mode Settings.

File Transcription

Long audio and video files naturally take longer than live dictation. The Transcribe view shows queue states such as Loading model, Processing audio, Transcribing, Enhancing, Completed, and Failed, so you can see where the delay is happening.

See Transcribe Audio Files.