AI Models for Transcription
Choose and configure the perfect AI model for your transcription needs.
Understanding Transcription Models
VoiceInk gives you the flexibility to choose from a variety of transcription models, each with different characteristics in terms of speed, accuracy, and cost.
Models are categorized into three main types:
- Local Models: These models run directly on your Mac. They are completely private and do not require an internet connection. They offer a good balance of speed and accuracy.
- Cloud Models: These models are hosted on third-party services (e.g., Groq, ElevenLabs, Deepgram, Parakeet, Gemini, Mistral, Soniox). They typically offer the highest accuracy and fastest transcription speeds but require an active internet connection and may have associated costs.
- Custom Models: You can add your own custom cloud models that are compatible with the OpenAI API format. This is useful for users who have access to specialized or private models.
:::note Apple Speech Native Model: The Apple Speech model uses the native macOS Speech framework and requires macOS 26 or later. This model provides excellent transcription quality but is only available on the latest macOS versions. :::
Managing Your AI Models
You can manage all your transcription models from the AI Models section in the VoiceInk application.
For a more detailed article on configuring models, see AI Model Configuration.
Selecting Your Default Model
- Navigate to the AI Models tab in the main window.
- At the top of the view, you will see your currently selected Default Model.
- To change the default model, browse through the list of available models.
- Click the Set as Default button on the card of the model you wish to use. This will be your primary model for all transcriptions.
Downloading Local Models
- Local models need to be downloaded before you can use them.
- Find the local model you want to use in the model list.
- Click the Download button. A progress bar will show the download status.
- Once downloaded, you can set it as your default model.
Deleting a Model
- If you no longer need a downloaded local model, you can delete it to free up disk space.
- Click the trash icon on the model card to remove the downloaded files.
Using Cloud & Custom Models
- Cloud models are available for use without any download required.
- To use a cloud model, you may need to add an API key for the respective service.
- You can add a Custom Model by selecting the "Custom" filter and clicking the "Add Custom Model" card. You will need to provide a model name, API endpoint, and API key.
Language Selection
You can specify the language for your transcription to improve accuracy. This setting is in the AI Models tab. Setting a specific language is recommended for non-English dictation — leaving it on Auto-Detect can sometimes cause the model to switch languages unexpectedly.
Model Settings
Click the gear icon in the AI Models tab to access additional transcription settings.
Output Format
For Whisper-based local models, you can provide a custom prompt that shapes the output formatting. This is not an instruction to an LLM — it is a style example that shows the model exactly how you want text formatted. For example, to ensure numbers are always written as digits, enter a sample like: I need 2 apples and 10 oranges.
Output format is set per language, so you can have different formatting for different languages.
Add Space After Paste
When enabled, VoiceInk appends a space after pasting your transcription. Useful if you dictate frequently mid-sentence and want to continue typing without manually adding a space.
Automatic Text Formatting
When enabled, VoiceInk applies basic text formatting to the transcription — capitalizing the first word and adding a period at the end if none is present. Disable this if you prefer raw output or if AI enhancement is handling formatting.
Voice Activity Detection (VAD)
When enabled, VoiceInk uses voice activity detection to automatically detect when you start and stop speaking, and can trim silence from the recording. This can improve transcription accuracy and reduce unnecessary processing.
Prewarm Model (Experimental)
When enabled, VoiceInk loads the transcription model into memory in the background so it is ready to transcribe immediately when you start recording. Reduces the delay on the first transcription after launching the app. Marked experimental as it increases memory usage.
Filler Words
The gear icon panel also contains the Filler Words settings — toggle filler word removal on/off and manage your custom filler word list directly from here.