Beyond the PDF: Why We Need a “Compositional Grammar” for Coverage Rules
Meta description (≈155 chars): PDFs are unreadable to machines. A compositional grammar—atoms + logic—plus a HITL QA Sidecar yields citable, 99.9%-grade coverage rules you can compare, query, and automate.
Suggested URL slug: /coverage-rules-compositional-grammar
TL;DR
Part 1 showed why 90% accuracy isn’t good enough in prior auth—the remaining 10% error compounds across many criteria and becomes a safety, compliance, and financial problem. The antidote is GenAI with evidence: a Human‑in‑the‑Loop (HITL) QA Sidecar verifying each criterion against the source. To make that verification scalable, we encode policies as a compositional grammar: atoms (e.g., cnid_bmi_30_ge) combined with Boolean logic. The result is a citable, auditable, and automatable policy‑as‑code layer.
Why the Grammar Matters (and how it fixes the 90% Problem)
In PA, a single determination may hinge on 10–15 specific checks (diagnosis, labs, time windows, step therapies, exclusions, renewal thresholds). Even if a model hits 90% per criterion, overall correctness nosedives—0.9¹⁰ ≈ 35%, 0.9¹⁵ ≈ 21%. By contrast, if we drive per‑criterion accuracy toward 99.9%, a 10‑criterion decision is ~99.0% correct.
The compositional grammar is what lets us measure and achieve that: it breaks prose into verifiable units (“atoms”) that the HITL Sidecar can accept or correct with line‑level evidence. Those verified atoms populate a Gold‑Standard dataset that pushes per‑criterion accuracy from “good drafts” to 99%+ decisions.
From Prose to Primitives: Atoms + Logic
Think of a coverage rule as two pieces:
- Atoms — Named, typed predicates that evaluate to true/false for a member, claim, or auth request.
- cnid_bmi_30_ge → BMI ≥ 30
- age_ge_18 → Age ≥ 18
- dx_icd10_e11 → ICD‑10 Type 2 diabetes (E11.*)
- lab_a1c_7_gt_within_90d → HbA1c > 7.0% within last 90 days
- rx_metformin_trial_90d_ge_within_365d → ≥90‑day metformin trial in last year
- doc_lifestyle_program_6mo_ge → Documented lifestyle program ≥ 6 months
Naming pattern (stable + readable):
{namespace}_{concept}_{value?}_{operator}{_time/qualifiers?}
Names are deterministic, units are standardized, and each atom is typed (numeric, code, date, boolean) with mappings to ICD‑10, CPT/HCPCS, LOINC, RxNorm where applicable.
- Logic — How atoms combine, using AND / OR / NOT with explicit grouping.
age_ge_18
AND
( cnid_bmi_30_ge OR (cnid_bmi_27_ge AND (dx_icd10_e11 OR dx_icd10_i10)) )
AND
doc_lifestyle_program_6mo_ge
{
"all": [
{"atom": "age_ge_18"},
{
"any": [
{"atom": "cnid_bmi_30_ge"},
{
"all": [
{"atom": "cnid_bmi_27_ge"},
{"any": [
{"atom": "dx_icd10_e11"},
{"atom": "dx_icd10_i10"}
]}
]
}
]
},
{"atom": "doc_lifestyle_program_6mo_ge"}
]
}
Provenance by Design: Make Every Atom Citable
To be trustworthy, each atom must be evidence‑backed:
{
"id": "cnid_bmi_30_ge",
"type": "numeric",
"concept": "bmi",
"operator": "ge",
"value": 30,
"unit": "kg/m2",
"evidence": {
"plan": "CarrierName Policy XYZ",
"uid": "policy-xyz-2025-01",
"pdf_url": "https://…/policy-xyz.pdf",
"clause_id": "2.1.3"
},
"effective": {
"start": "2025-01-01",
"end": null
},
"version": "1.0.0"
}
Why this matters: the HITL QA Sidecar can show the model’s
claim on the left and the exact cited clause on the right. Reviewers
either accept or correct the atom; approvals are versioned and hash‑logged.
That workflow turns GenAI drafts into a Gold‑Standard dataset with 99.9%
clause accuracy (no one says 100%—think “Clorox standard”).
How the HITL QA Sidecar Uses the Grammar
Side‑by‑side verification
- One-side:
Structured model output (plan, drug, indication, atomized criteria). - Other-side:
Cited passages {plan, uid, clause_id, pdf_url} with highlighted.
Pass/fail rubric
- Scope
(plan/drug/indication), logic shape (AND/OR), thresholds (e.g., A1c ≥ 7),
windows (≤ 90 days), ICD‑10/CPT correctness, renewal criteria.
One‑step corrections
- Swap
in the correct clause or threshold. Corrections are logged to improve
future runs.
Structured only
- If
criteria don’t align, the item is sent back to earlier pipeline stages—no
unstructured “maybe” answers escape.
Versioned snapshots & drift
- Track clause
prevalence and CNID usage over time for each
plan/drug/indication. When a policy changes, the diff appears at
atom level.
Why Grammar + Sidecar Unlocks Compare, Query, Automate
- Compare
at scale- “This
year Payer A replaced cnid_bmi_30_ge with (cnid_bmi_27_ge AND
dx_icd10_e11).” - “Only
3 of 14 plans require doc_lifestyle_program_6mo_ge for first‑line
therapy.”
- “This
- Query
the Policy Vault (the “magic”)- “Show
all policies using A1c thresholds > 7% or requiring two oral
agents within 120 days.” - “Find
rules where negative conditions (NOT contraindications) drive
denials.”
- “Show
- Automate,
but keep it explainable- Pre‑Check
bundles: Likely / Borderline / Unlikely CNIDs per plan‑drug-indication
pairing. - Doc
Pack: Auto‑generated appeal packets and “missing evidence” flags. - Transparent
outcomes: Atom‑level pass/fail with the exact policy clause link.
- Pre‑Check
Implementation Blueprint
- Controlled
vocabulary — Canonical concepts, operators, units, and code systems. - Atom
registry — IDs, definitions, examples, evidence fields, versions,
effective dates. - Authoring
& linting — A policy editor that composes atoms, flags missing
units, unknown codes, dangling references. - Evaluation
engine — Strong typing, unit coercion, explicit handling of unknown. - Diffs
& alerts — Rule‑ and atom‑level change detection with subscriber
notifications. - APIs
— GET /atoms (or as I call them, Clinical Node IDs), GET /rules, POST
/evaluate returning verdicts and explanations.
Quality You Can Measure (and Promise)
Auditability KPIs
- Traceability:
Every claim retains {plan, uid, clause_id, statement_hash, pdf_url}. - Snapshot
integrity: Answers tie to a specific policy version; clause and rule
hashes match the plan × drug × indication at first pass. - Drift
tracking: Alerts when thresholds/time windows change, with impact
analysis. - Reviewer
agreement: Inter‑rater reliability on Sidecar approvals.
Worked GLP‑1 Example (bridge to Article 3)
Draft claim: “Plan U requires two oral agents within 90
days and A1c ≥ 7% for initial GLP‑1 approval; renewal needs ≥
1% A1c reduction.”
Sidecar check items
- Step
therapy atoms present? cnid_step_any_two_orals_n2, window_days=90 - Lab
threshold atom present? cnid_lab_a1c_ge_7_90d - Renewal
atom present? cnid_renewal_a1c_drop_ge_1 - Scope
correct? (GLP‑1, adult T2D) - Citations
present? Show exact clause lines; verify no contradicting text.
Decision: Approve or edit (e.g., step window is 120
days, not 90).
Outcome: Ships only when every atom is backed by a current snapshot
citation.
In Article 3, we’ll “mint” the first 25 CNIDs for GLP‑1
coverage—proof that the grammar is practical, testable, and reusable.
FAQ
Isn’t this just guidelines in JSON?
No—this is policy‑as‑code with provenance. Drug policy changes
continuously. We need typed atoms, normalized units/codes, explicit operators,
and versioned logic, each tied to the source clause.
How do you handle ambiguity and negatives?
We model them explicitly (e.g., NOT cnid_contraindication_any) and flag
ambiguous language for reviewer action. The QA Sidecar enforces a pass/fail
rubric by licensed clinicians
Will this replace PDFs?
The PDF remains the legal narrative. The grammar is the operational
substrate that makes policies comparable, queryable, and automatable..
Request a 10-Minute Walkthrough
Want to see the HITL QA Sidecar verify a GLP-1 policy end-to-end? We’ll review citations, atoms, diffs, and the evaluation output live.
✉️ Email to Request the Demo





