AI Medical Scribe HIPAA Risks: What Your Vendor Isn't Telling You
The Short Version
- A BAA is necessary but not sufficient. Signing one distributes liability; it doesn't prevent breaches.
- Subprocessor chains are the hidden risk. Your BAA is with the vendor, not necessarily with AWS or Azure where your data actually lives.
- Training data questions are rarely answered clearly. Read the contract, not the marketing page.
- On-device dictation eliminates the transmission risk entirely. No PHI sent = no cloud breach vector.
Every AI medical scribe vendor will tell you they're "HIPAA compliant." Some of them even are, in the narrow technical sense. But HIPAA compliance is a floor, not a ceiling, and the gap between "we have a BAA" and "your patient data is safe" is wider than most vendors acknowledge.
This page covers the real compliance risks with cloud AI scribes: what BAAs actually do (and don't do), how subprocessor chains create exposure you might not know about, whether your clinical data is training AI models, and what breach scenarios look like in practice. It also covers the alternative: on-device dictation where none of this applies because no PHI leaves your device.
The BAA Problem: What It Actually Does
A Business Associate Agreement is a contract between a covered entity (you) and a business associate (your AI scribe vendor). It spells out how the vendor can use PHI, what security controls they must have, how they notify you in case of a breach, and who's liable for what.
Here's what a BAA does not do:
- It does not prevent a breach from happening.
- It does not give you control over the vendor's infrastructure security.
- It does not guarantee your data will be deleted when you cancel.
- It does not protect you from regulatory investigation if patient data is exposed.
A BAA is a liability distribution mechanism. When something goes wrong, the BAA determines who owes what to whom. But from your patient's perspective, and from HHS's investigation perspective, the covered entity (you, the physician or practice) bears primary HIPAA responsibility. You can pursue the vendor for damages after the fact. You can't undo the breach notification obligation, the patient trust damage, or a potential OCR investigation.
This is the thing vendors don't say clearly: "we'll sign a BAA" doesn't mean "you're protected." It means "we've agreed on paper about what happens when something goes wrong."
The Subprocessor Problem
When you sign a BAA with an AI scribe vendor, you're entering into an agreement with that specific company. But your data may not stay with that company.
Look at how a typical cloud AI scribe works:
- Your audio goes to the vendor's API endpoint.
- The vendor's application runs on Amazon Web Services, Google Cloud Platform, or Microsoft Azure.
- The speech recognition may be processed by a third-party model provider.
- The structured note may be generated by a large language model from yet another provider.
- Logs and audit trails may be stored in a separate analytics service.
Each of those downstream services is a subprocessor. Under HIPAA, if a subprocessor handles PHI, they need to have appropriate safeguards. The vendor should have BAAs with their subprocessors. But you need to verify this chain, not assume it.
When you ask a vendor "who are your subprocessors?" a good answer is a specific list with BAA documentation for each. A bad answer is "we use enterprise-grade cloud infrastructure" or "we're SOC 2 certified." SOC 2 certification says something about security controls; it doesn't verify the specific HIPAA subprocessor chain for your contract.
The practical risk: your vendor has good security, but one of their subprocessors doesn't. Or your vendor's contract with AWS has appropriate DPAs but their contract with the NLP model provider doesn't. You're exposed through a part of the chain you never even knew about.
The Training Data Question
Here's a question most clinicians don't think to ask: is your patient data being used to train the vendor's AI models?
AI scribes get better over time. That improvement comes from training data. Where does that data come from? Often from their paying customers.
Most reputable vendors do anonymize or de-identify data before using it for training. De-identification under HIPAA's Safe Harbor method requires removing 18 specific identifiers. Audio is harder to de-identify than structured data -- patient names mentioned in conversation, unique health conditions, rare diagnoses, and the physician's voice can all be re-identifying in context.
The honest answer is that the training data practices vary significantly by vendor, and the contracts are not always clear. Some vendors explicitly state no customer data is used for training. Others have vague language about "improving services" that could encompass model training. Some have changed their policies post-launch in ways that aren't proactively communicated to customers.
You need to ask specifically: "Is any audio or text from my clinical encounters used to train or improve your AI models, including through anonymized or de-identified data?" Get the answer in writing, in your contract. Not on a marketing page. Marketing pages can be updated without notice; contracts have legal teeth.
Real Breach Scenarios With Cloud AI Scribes
Healthcare data breaches involving AI and software vendors aren't hypothetical. The healthcare sector accounted for 18% of all reported data breaches in 2024, according to the Identity Theft Resource Center. Software vendors and third-party processors are a significant and growing breach vector.
Here's what a cloud AI scribe breach scenario actually looks like:
Scenario 1: Direct vendor breach. An attacker gains access to the vendor's cloud storage containing audio recordings or transcriptions from clinical encounters. Your patients' PHI is exposed. You receive breach notification from the vendor. You must notify affected patients and HHS within 60 days. OCR may investigate your practice even though the breach was on the vendor's infrastructure. The BAA provides legal recourse against the vendor, but the notification obligation and reputational damage are yours.
Scenario 2: Subprocessor breach. A smaller company in the vendor's processing chain is compromised. You didn't know this company handled your data. The vendor notifies you after discovering the breach. Same obligations as above, with added complexity because the breach chain runs through a subprocessor you had no direct relationship with.
Scenario 3: Insider access. A vendor employee with access to clinical audio files misuses that access. This is more common than external breaches in some sectors. The vendor's BAA should cover this, but detection and notification may take time, during which access continues.
Scenario 4: Vendor acquisition or bankruptcy. Your AI scribe vendor is acquired by a company with different data practices, or files for bankruptcy with PHI in their infrastructure. What happens to your patient data depends on the acquiring entity or bankruptcy proceedings. Your BAA may or may not transfer. This is a genuine risk with early-stage AI scribe startups where financial stability is uncertain.
None of these scenarios apply to on-device dictation. If there's no data on a vendor's server, there's nothing to breach.
What "HIPAA Compliant" Actually Means
HIPAA compliance means meeting the minimum requirements of the Health Insurance Portability and Accountability Act's Privacy Rule, Security Rule, and Breach Notification Rule. It's a regulatory floor, not a security certification.
You can be HIPAA compliant and still have:
- Customer audio stored longer than clinically necessary
- Subprocessors with weaker security than the primary vendor
- Ambiguous training data practices
- No SOC 2 Type II certification (the security audit, separate from HIPAA)
- Offshore data processing in jurisdictions with different legal protections
When a vendor says "we're HIPAA compliant," you're hearing "we've met the minimum legal requirements." That's worth knowing. It's not sufficient for a thorough vendor risk assessment.
The questions to ask instead:
- Where specifically is my data stored? Which data center, which country?
- What's your data retention policy? When is audio deleted?
- Who are your subprocessors and can you provide BAA documentation for each?
- Have you had any security incidents in the past 24 months?
- Do you have SOC 2 Type II certification? Can I see the report?
- Is any customer data used for model training, directly or in de-identified form?
AI Scribe-Specific Risks Beyond Standard Cloud SaaS
AI medical scribes carry a few risks that aren't present in standard cloud software:
Audio is harder to anonymize than text. Most cloud services handle structured data. AI scribes handle voice recordings of clinical encounters. Audio de-identification is a harder technical problem than removing 18 identifiers from a spreadsheet. Vendor claims about "de-identified" audio data deserve scrutiny.
Ambient scribes capture everything. Products like Nuance DAX Copilot record the entire patient-physician conversation, not just what you dictate. This means the vendor's infrastructure holds audio of your patient saying things they may not realize are being captured, including sensitive disclosures that never end up in the official clinical note.
AI hallucinations create a different liability. An AI-generated note draft may contain inaccurate information. If a physician reviews and signs off on a note that was incorrect due to AI hallucination, and harm results, the liability landscape is still being worked out legally. This isn't a HIPAA issue directly, but it's a clinical risk tied to AI scribe adoption.
API keys and tokens can be compromised. Cloud AI scribes use API keys and authentication tokens. A compromised key can give an attacker API-level access to submit or retrieve clinical data. This attack vector doesn't exist for offline software.
The On-Device Alternative: Why No Transmission Equals No Compliance Burden
Here's the clean solution: if PHI never leaves your device, none of the above applies.
No transmission means no subprocessor chain. No subprocessor chain means no subprocessor breach exposure. No cloud storage means no vendor breach scenario. No audio on external servers means no training data concern. No business associate relationship means no BAA required.
VoicePrivate Healthcare Edition processes your voice entirely on your Mac. The speech recognition model runs locally. The medical vocabulary is stored locally. The transcription output goes directly into your Mac's active application. No audio is sent anywhere. No text is sent anywhere. Nothing leaves your device.
That's not a marketing claim with asterisks. It's an architectural fact you can verify. There are no network requests to cloud services. No API keys. No vendor infrastructure to breach.
You can review this in detail on the VoicePrivate HIPAA Architecture page.
A Practical Vendor Evaluation Checklist
If you're evaluating any cloud AI scribe, use this checklist before signing:
- BAA in place before any PHI use: Signed BAA in hand, not "we'll send it after onboarding."
- Subprocessor list reviewed: Specific list of all subprocessors that touch PHI, with BAA documentation for each.
- Data retention policy understood: Know when audio and transcription data are deleted. Get it in the BAA.
- Training data policy explicit: Contract language explicitly addressing whether customer data is used for model training.
- Security incident history: Ask directly. No reputable vendor should refuse to disclose incident history.
- SOC 2 Type II report reviewed: Request the actual report, not just a badge on their website.
- Termination data handling: Understand what happens to your data when you cancel the service.
- Notification timeline: BAA should specify the breach notification timeline (HIPAA requires no more than 60 days; many enterprise BAAs commit to faster notification).
Skip the Compliance Overhead Entirely
VoicePrivate Healthcare Edition processes everything on your device. No PHI transmitted. No BAA required. No breach vector. 74,000+ medical terms, from $9.99/month.
Learn About VoicePrivate HealthcareFrequently Asked Questions
Is using AI medical scribes HIPAA compliant?
Cloud-based AI scribes can be used in a technically HIPAA-compliant way with a signed BAA and appropriate security controls. But "HIPAA compliant" is a minimum standard, not a safety guarantee. A BAA distributes liability after a breach; it doesn't prevent one. On-device dictation software that transmits no PHI is a fundamentally different risk profile: there's no compliance overhead to manage because there's no third-party data handling at all.
Can AI scribes use my patient data for training?
It depends on your specific vendor contract. Some AI scribe vendors explicitly prohibit using customer data for model training. Others have vague "service improvement" language that could include training. Ask directly and get specific contract language before signing. With on-device dictation software, the question is moot: no data is transmitted, so model training on your data is architecturally impossible.
What happens if my medical AI scribe gets breached?
If PHI is exposed in a breach on a cloud scribe vendor's infrastructure, you have a HIPAA breach notification obligation to affected patients and to HHS. You may face OCR investigation even though the breach was on the vendor's systems. Your BAA gives you legal recourse against the vendor, but the regulatory and reputational consequences still fall on your practice. Healthcare AI vendors are increasingly targeted because they hold audio recordings from large numbers of clinical encounters.
Do I need a BAA for on-device dictation software?
No. HIPAA's BAA requirement applies when you share PHI with a third party acting as a business associate. On-device dictation software that processes everything locally and never transmits data externally doesn't involve sharing PHI with anyone. There's no business associate relationship, so no BAA is required. This is one of the significant compliance advantages of on-device architecture.