How to Use Voice to Text on Mac Across Every App
If you've searched for how to use voice to text on Mac, you've probably landed on Apple's support page and gotten the basics: press the Microphone key, speak, done. Fine for a quick sentence in Mail. Not enough if you want dictation wired into your actual workflow — Notion, Slack, VS Code, legal documents, whatever you spend real time in.
This guide goes further. We'll cover Apple's built-in dictation, its real limitations, and then show you how to build a cross-app voice workflow using per-app modes, keyboard shortcuts, custom vocabulary, and on-device processing that never sends your audio anywhere.
TL;DR
- Apple's built-in dictation works in macOS 13+ but routes audio to Apple's servers unless you're on Apple Silicon with Enhanced Dictation enabled.
- For cross-app workflows, per-app transcription modes and live dictation that types directly into any focused app give you far more control.
- On-device tools like VoicePrivate process everything locally, with no cloud uploads, no account required, and no internet needed after the initial model download.
Step 1: Turn On Voice Dictation on Your Mac
How do I turn on voice to text on a Mac?
Go to System Settings > Keyboard > Dictation and toggle Dictation to On. Once enabled, macOS will ask whether to use the microphone key or a custom shortcut to activate it.
Photo by Rahul Shah on Unsplash
Here's what you're choosing between at this point:
- Standard Dictation — sends audio to Apple's servers, requires an internet connection.
- Enhanced Dictation (Apple Silicon) — processes on-device when available, no server upload.
On Apple Silicon Macs (M1 and later), macOS can handle a good chunk of dictation locally. On Intel Macs, standard dictation still phones home. If that distinction matters for your work — and it should — know this before you dictate your first sensitive sentence.
Step 2: Find the Dictation Key and Set Your Shortcut
Where is the Dictation key on a Mac keyboard?
On most modern Macs, the Dictation key is F5 (or the key with a microphone icon in the function row). On older Macs or external keyboards without that key, macOS defaults to pressing Fn (Function) twice in quick succession.
You can change this. In System Settings > Keyboard > Dictation, look for the "Shortcut" dropdown. Options include:
- Press Fn twice
- Press the Microphone key
- A custom keyboard shortcut you define
How do I activate my voice typing?
Once Dictation is on, click into any text field, then trigger your shortcut. A microphone indicator appears near your cursor. Speak naturally. Pause briefly at the end of sentences. Click outside the indicator or press Escape to stop.
For punctuation, say the word: "comma", "period", "question mark", "new line", "new paragraph". For formatting, say "all caps" before a word. For emoji on macOS Ventura and later, say "emoji" followed by the name — "emoji thumbs up", for example.
Step 3: Understand the Cross-App Limitation — and How to Work Around It
Here's the thing: Apple's dictation is a system-level feature that activates wherever your cursor is. That sounds universal. In practice, it works inconsistently across apps.
Photo by Elīna Arāja on Unsplash
Some apps intercept keyboard input in ways that interfere with dictation injection. Electron apps — Slack, VS Code, Notion desktop — and certain web-based interfaces have varying levels of reliability. The dictation overlay appears, you speak, and then the text lands in the wrong place, gets duplicated, or doesn't appear at all.
This is the gap most guides don't address.
The more reliable approach is a tool that runs as a system-level overlay, monitors which app is in focus, and injects text at the cursor position using the macOS Accessibility API. VoicePrivate's live dictation mode does exactly this — it types directly into whatever Mac app is active, treating the target app as a passive text receiver rather than relying on that app's own dictation support.
This matters most in:
- Notion (Electron-based, known dictation weirdness)
- Slack (web-rendered text fields)
- VS Code (custom editor with its own input handling)
- Any browser-based SaaS tool
Step 4: Set Up Per-App Transcription Modes
How to do voice transcription on a Mac?
For one-off transcription — converting an existing audio or video file to text — the fastest path is dragging the file into a dedicated transcription app. VoicePrivate supports drag-and-drop file transcription: drop an audio or video file onto the app, and it processes everything locally using its on-device speech recognition engine.
Photo by Marco Sebastian Mueller on Unsplash
Live transcription while you work is a different workflow:
- Open VoicePrivate and configure a transcription mode for each app you use regularly.
- Per-app modes let you set vocabulary profile, preferred language, and whether to auto-punctuate.
- Switch between apps normally. The active mode follows focus.
You're not constantly reconfiguring. Set it once per app, and the right mode loads automatically when that app comes into focus. In practice, this is what makes the per-app approach worth it.
Step 5: Add Custom Vocabulary for Your Domain
Apple's built-in dictation handles everyday English reasonably well. It struggles with domain-specific terms: medical abbreviations, legal citations, financial instrument names, proprietary product names, people's names spelled unconventionally.
Photo by cottonbro studio on Unsplash
VoicePrivate addresses this two ways:
Custom vocabulary: Add terms the engine should recognize correctly. If you regularly dictate "amortization schedule" or "anterior cruciate ligament" or a client's unusual company name, add it once. It applies globally.
Specialty editions: VoicePrivate ships in five editions — General, Healthcare, Legal, Finance, and Insurance. Each specialty edition comes pre-loaded with domain-specific vocabulary. If you're in healthcare and dictating clinical notes, the Healthcare edition already knows the terminology you use daily. You can review the Healthcare features for a full breakdown.
Step 6: Use AI Command Mode to Transform Dictated Text
Most voice-to-text tools stop at transcription. You get a raw transcript and edit it yourself. VoicePrivate includes an AI command mode that lets you transform text with instructions — all processed locally.
Photo by cottonbro studio on Unsplash
Practical examples:
- Dictate rough notes, then issue a command like "rewrite as bullet points" or "convert to formal memo tone"
- Dictate a meeting summary, then extract action items with a single command
- Dictate conversationally, then tighten it to match a specific document style
This isn't sending your text to an external API. It runs on-device. No cloud, no account, no data leaving your machine.
For journalists, lawyers, clinicians, or anyone handling sensitive material, this matters more than most guides tell you. You get editing intelligence without the exposure that comes with pasting sensitive content into a browser-based AI tool.
Step 7: Export Transcripts in the Right Format for Your Workflow
Matching export format to destination app
Raw transcription is only the start. Where the text goes next determines which export format you need.
VoicePrivate supports five export formats:
- .txt — plain text, works everywhere, no formatting
- .md — Markdown, ideal for Notion, Obsidian, GitHub, static site generators
- .json — structured data, useful for developers or tools that consume transcript metadata
- .srt — SubRip subtitles, standard format for video caption workflows
- .vtt — WebVTT, used by HTML5 video players and most web video platforms
If you're transcribing interviews for a content workflow, .md drops cleanly into Notion or Obsidian with paragraph structure intact. If you're captioning video, .srt or .vtt exports plug directly into DaVinci Resolve, Final Cut Pro, or your video hosting platform. Pick the format that removes a step, not adds one.
Step 8: Understand the Privacy Architecture Before You Dictate Sensitive Content
Where does your audio actually go?
This question almost never gets answered directly in voice-to-text guides. Here's the answer for each option:
Apple built-in dictation (standard mode): Audio is sent to Apple's servers for processing. Apple's privacy policy governs retention. The audio does leave your device.
Apple dictation on Apple Silicon (Enhanced mode): Processed on-device for shorter dictation sessions. Doesn't upload audio. The better option if you're on a newer Mac and privacy matters.
Cloud tools (Otter.ai, Google Docs voice typing, etc.): Audio and transcripts are stored on third-party servers. Convenient — right up until it isn't. Once it's uploaded, you're subject to that service's data practices, breach risk, and terms of service changes.
VoicePrivate: 100% on-device processing. Zero cloud uploads. No account required. No telemetry. Your audio and transcripts never leave your machine. That's the full architecture — no asterisks.
If you handle sensitive client information, confidential business discussions, or personal health information, the on-device architecture is worth evaluating carefully against your own requirements. We describe the technical setup; you draw your own conclusions about what that means for your situation. For a detailed breakdown of the privacy architecture in a clinical context, see VoicePrivate Healthcare Privacy.
Step 9: Troubleshoot Common Dictation Accuracy Problems
Accuracy varies by use case. That's the honest answer. Here's how to push it higher when you hit problems:
Problem: Misheard common words
- Add them to custom vocabulary
- Slow down slightly on words with unusual stress patterns
- Check that your microphone input level isn't clipping (System Settings > Sound > Input)
Problem: Punctuation not appearing
- With Apple Dictation, say punctuation explicitly: "period", "comma", "question mark"
- With VoicePrivate's auto-punctuation enabled, you shouldn't need to say punctuation — but if it's misbehaving, check whether auto-punctuation is toggled on in your active mode
Problem: Language detection failures
- VoicePrivate supports 99 languages. If you switch between languages mid-session, set the correct language before starting — don't expect automatic detection mid-sentence
- Apple Dictation supports language switching but requires you to set the language in System Settings > Keyboard > Dictation before activating
Problem: Text appearing in the wrong place
- Click directly in the target text field before activating dictation
- In apps with multiple panes — VS Code with a terminal and editor split, for example — click to confirm cursor position first
What Two Types of Dictation Does macOS Offer?
A question that comes up often. macOS offers two dictation modes:
- Standard Dictation — sends audio to Apple for processing, requires internet, works on all Macs.
- Enhanced Dictation / on-device Dictation — processes locally on Apple Silicon Macs, no internet required, handles continuous dictation without a time limit.
Bottom line: on-device dictation on M-series Macs responds faster and keeps audio local. Standard dictation depends on your connection speed and Apple's servers.
For power users who need cross-app reliability, longer sessions, domain vocabulary, or guaranteed offline capability, a dedicated on-device tool fills gaps that either built-in mode leaves open. For more on how these options compare, see Voice to Text for Mac: Speed, Accuracy, and Privacy for Power Users.
Key Takeaways
- Turn on macOS Dictation in System Settings > Keyboard > Dictation. The Microphone key is F5 on most modern Mac keyboards, or set a custom shortcut to avoid conflicts with your apps.
- Apple's built-in dictation works for simple use cases but behaves inconsistently in Electron and web-based apps. A system-level tool that injects text via the Accessibility API is more reliable across all apps.
- Per-app transcription modes, custom vocabulary, and specialty editions (Healthcare, Legal, Finance, Insurance) let you match the transcription configuration to the actual work you're doing.
- VoicePrivate processes everything on-device with no cloud uploads, no account required, and no internet needed after the initial setup — a meaningful difference from cloud-based alternatives like Otter.ai or Google Docs voice typing.
- Export formats include .txt, .md, .json, .srt, and .vtt, covering content workflows, developer use cases, and video captioning. Expanded formats are available on paid plans.