The 90% Problem: Why GenAI Alone Can't Solve Prior Authorization -

The 90% Problem: Why GenAI Alone Can’t Solve Prior Authorization

GenAI can produce a solid first draft of policy reasoning extracts, but in prior authorization (PA) the remaining 10% error is a patient safety, compliance, and financial disaster. The fix is GenAI with evidence: a Human in the Loop (HITL) QA Sidecar—so every output is citable, auditable, and measurably correct.

The “90% is not good enough” reality

In prior authorization, a single determination often rests on many specific criteria (diagnosis, labs, time windows, step therapies, exclusions, prescriber specialty, quantities and renewal thresholds). Even if a model is “90% accurate”, the combined probability that all criteria are correct plummets:

10 criteria at 90% each → 0.9¹⁰ ≈ 35% overall correctness.
15 criteria at 90% each → 0.9¹⁵ ≈ 21% overall correctness.

Now flip the target: if you drive per-criterion accuracy to 99.9%, then 10-criterion decisions are 0.999¹⁰ ≈ 99.0% correct.

Why “pure GenAI” breaks in PA

Narrative PDFs, not data. Policies are prose—time-boxed, exception-heavy, and plan-specific.
No provenance. Free-text answers often lack line-level citations back to the exact clause.
Non-determinism. Small prompt changes yield different outputs, making audits riskier. Strict regex will break too, because templates can be misformatted, contain misspellings, or a policy may not be self-contained.
Hidden edge cases. Negative conditions (“not for…”, “unless…”) and nested logic (AND/OR) are easy to miss.
Compounding error. Even “good” clause recall becomes unacceptable when multiplied across a full determination.

Conclusion: GenAI alone is a great first draft—and an unacceptable final decision system.

The solution: GenAI with Human-grade Verification

We pair GenAI with evidence and guardrails:

Gold Standard dataset (target 99.9% clause accuracy—like Clorox, we can’t claim 100%).
- Normalize and atomize each policy into clause-level “coverage atoms” (e.g., cnid_lab_a1c_ge_7_90d, cnid_step_any_two_orals_n2, cnid_specialist_endocrinology).
- Provenance by design: every atom links to {plan, uid, pdf_url}.
- Versioned snapshots & drift: clause and CNID prevalence over time for each plan/drug/indication.
HITL QA Sidecar (the reviewer cockpit)
- Side-by-side view: model policy on the left; exact cited clauses on the right, with edit permissions.
Figure 1 — HITL QA Sidecar (Streamlit). Left: model claims and structured fields (plan, drug, indication, CNIDs). Right: the exact cited clauses with {plan, uid, clause_id, pdf_url} and time windows highlighted. Reviewers accept or correct each field; approvals are versioned and hash logged.
- Pass/fail rubric: reviewers check scope (plan/drug/indication), logic (AND/OR), thresholds (A1c ≥ 7), auth windows (≤ 90 days), ICD-10 codes, and renewal criteria.
- Structured outputs only: if a set of criteria doesn’t match, it can be sent back into an earlier phase of the data pipeline.
- One-click corrections: reviewers can swap in the correct clause; corrections are logged to improve future runs.
Small deterministic math on top
- Counts, ladders, prevalence: these can all be computed across plans, not guessed.
- Pre-Check bundles: Likely / Borderline / Unlikely CNIDs per plan-drug pairings are a future feature development.
- Doc Pack: auto-generated appeal packets and flags for unmet criteria.

Quality you can measure (and promise)

Auditability KPIs

Traceability: every published claim retains {plan, uid, clause_id, statement_hash, pdf_url}.
Snapshot integrity: criteria and clause hashes match the drug × plan × indication combination at first pass. Each answer is linked to a specific version.

A concrete example (GLP-1s)

Draft claim: “UHC requires two oral agent trials within 90 days and A1c ≥ 7% for initial approval for drug X; renewal requires ≥ 1% A1c reduction.”

Sidecar check items

Step therapy atoms present? cnid_step_any_two_orals_n2, window_days=90
Lab threshold atom present? cnid_lab_a1c_ge_7_90d
Renewal atom present? cnid_renewal_a1c_drop_ge_1
Scope correct? (GLP-1, adult T2D)
Citations present? Show the exact UHC clause lines; verify no clause contradicts.
Decision: Approve or request edit (e.g., step window is 120 days, not 90).

Outcome: The claim ships only if every atom is supported by a cited clause from the current snapshot. Otherwise, it is corrected or rejected.

Why this builds trust with clinicians, payers, and manufacturers

No black boxes. Every claim is tied to line-level evidence and normalized into a deterministic schema.
Deterministic “math”. Counts, restrictiveness ladders, prevalence, and pre-checks are computed from ground clinical truth, not guessed.
Auditable by design. Hashes, timestamps, and snapshot versioning make compliance and root-cause analysis straightforward. Hash-awareness and idempotency are critical facets of provenance.
Human-grade standards. The Sidecar enforces a repeatable QA rubric so “90% drafts” become “99%+ decisions.”

Call to action

See it live: Request a 10-minute walkthrough of the HITL QA Sidecar (we’ll review a GLP-1 example end to end).

Email to request the demo

The button above creates an email to pharmularyproject@gmail.com with the subject pre-filled.

Bottom line: GenAI gets you a fast, credible first draft. In prior authorization, safety and compliance demand evidence, grammar, and humans—so accuracy compounds and trust is earned.

About Andrew

Hey! I’m Andrew Gilberto Vargas, a pharmacist and writer. I reflect on concepts that shape pharmacy benefits, drug access, leadership and meaning-making. Always curious, always learning.

The Originator Ethic: Freshness, Lineage, and Your Voice

December 15, 2025

A big maze of devices and documents with drugs interspersed throughout

Adversarial Interoperability as an Ethic (and Why Policy Vault Exists)

January 2, 2026

A horizon, parent, child, eye and ekg readout in the night sky

Entrusting Resilience: Cardinal Signs and the Discipline of Not Intervening

December 31, 2025

A person holding text in their hand for the GAP

The Goodness Arbitrage Protocol (GAP): An Ethical Framework for Reflective Creation

May 26, 2025

A typewriter associated with provenance and deltas, carving a path to legibility through the fog

Watchdoggery, Unencumbered

January 5, 2026

About the Author

Andrew Vargas, PharmD is a Pharmacist practicing watchdoggery and founder of Pharmacist Write. He builds coverage intelligence tools and writes about what pharmacy benefits managers would prefer stayed invisible—turning policy into something patients, consultants, and purchasers can actually use.

🧠 Read full bio · View all articles