Beyond the PDF: Why We Need a “Compositional Grammar” for Coverage Rules

Beyond the PDF: Why We Need a “Compositional Grammar” for Coverage Rules

Meta description (≈155 chars): PDFs are unreadable to machines. A compositional grammar—atoms + logic—plus a HITL QA Sidecar yields citable, 99.9%-grade coverage rules you can compare, query, and automate.

Suggested URL slug: /coverage-rules-compositional-grammar


TL;DR

Part1 showed why 90% accuracy isn’t good enough in prior auth—the remaining 10% error compounds across many criteria and becomes a safety, compliance, and financial problem. The antidote is GenAI with evidence: a Human‑in‑the‑Loop (HITL) QA Sidecar verifying each criterion against the source. To make that verification scalable, we encode policies as a compositional grammar: atoms (e.g., cnid_bmi_30_ge) combined with Boolean logic. The result is a citable, auditable, and automatable policy‑as‑code layer.


Why the Grammar Matters (and how it fixes the 90% Problem)

In PA, a single determination may hinge on 10–15 specific checks (diagnosis, labs, time windows, step therapies, exclusions, renewal thresholds). Even if a model hits 90% per criterion, overall correctness nosedives—0.9¹⁰ ≈ 35%, 0.9¹⁵ ≈ 21%. By contrast, if we drive per‑criterion accuracy toward 99.9%, a 10‑criterion decision is ~99.0% correct.

The compositional grammar is what lets us measure and achieve that: it breaks prose into verifiable units (“atoms”) that the HITL Sidecar can accept or correct with line‑level evidence. Those verified atoms populate a Gold‑Standard dataset that pushes per‑criterion accuracy from “good drafts” to 99%+ decisions.


From Prose to Primitives: Atoms + Logic

Think of a coverage rule as two pieces:

  1. Atoms — Named, typed predicates that evaluate to true/false for a member, claim, or auth request.
    • cnid_bmi_30_ge → BMI ≥ 30
    • age_ge_18 → Age ≥ 18
    • dx_icd10_e11 → ICD‑10 Type 2 diabetes (E11.*)
    • lab_a1c_7_gt_within_90d → HbA1c > 7.0% within last 90 days
    • rx_metformin_trial_90d_ge_within_365d → ≥90‑day metformin trial in last year
    • doc_lifestyle_program_6mo_ge → Documented lifestyle program ≥ 6 months

Naming pattern (stable + readable):
{namespace}_{concept}_{value?}_{operator}{_time/qualifiers?}
Names are deterministic, units are standardized, and each atom is typed (numeric, code, date, boolean) with mappings to ICD‑10, CPT/HCPCS, LOINC, RxNorm where applicable.

 

  1. Logic — How atoms combine, using AND / OR / NOT with explicit grouping.
Samples shown below
age_ge_18
AND
( cnid_bmi_30_ge OR (cnid_bmi_27_ge AND (dx_icd10_e11 OR dx_icd10_i10)) )
AND
doc_lifestyle_program_6mo_ge
{
  "all": [
    {"atom": "age_ge_18"},
    {
      "any": [
        {"atom": "cnid_bmi_30_ge"},
        {
          "all": [
            {"atom": "cnid_bmi_27_ge"},
            {"any": [
              {"atom": "dx_icd10_e11"},
              {"atom": "dx_icd10_i10"}
            ]}
          ]
        }
      ]
    },
    {"atom": "doc_lifestyle_program_6mo_ge"}
  ]
}

Provenance by Design: Make Every Atom Citable

 

To be trustworthy, each atom must be evidence‑backed

{
  "id": "cnid_bmi_30_ge",
  "type": "numeric",
  "concept": "bmi",
  "operator": "ge",
  "value": 30,
  "unit": "kg/m2",
  "evidence": {
    "plan": "CarrierName Policy XYZ",
    "uid": "policy-xyz-2025-01",
    "pdf_url": "https://…/policy-xyz.pdf",
    "clause_id": "2.1.3"
  },
  "effective": {
    "start": "2025-01-01",
    "end": null
  },
  "version": "1.0.0"
}

Why this matters: the HITL QA Sidecar can show the model’s
claim
on the left and the exact cited clause on the right. Reviewers
either accept or correct the atom; approvals are versioned and hash‑logged.
That workflow turns GenAI drafts into a Gold‑Standard dataset with 99.9%
clause accuracy
(no one says 100%—think “Clorox standard”).


How the HITL QA Sidecar Uses the Grammar

Side‑by‑side verification

  • One-side:
    Structured model output (plan, drug, indication, atomized criteria).
  • Other-side:
    Cited passages {plan, uid, clause_id, pdf_url} with highlighted.

Pass/fail rubric

  • Scope
    (plan/drug/indication), logic shape (AND/OR), thresholds (e.g., A1c ≥ 7),
    windows (≤ 90 days), ICD‑10/CPT correctness, renewal criteria.

One‑step corrections

  • Swap
    in the correct clause or threshold. Corrections are logged to improve
    future runs.

Structured only

  • If
    criteria don’t align, the item is sent back to earlier pipeline stages—no
    unstructured “maybe” answers escape.

Versioned snapshots & drift

  • Track clause
    prevalence
    and CNID usage over time for each
    plan/drug/indication. When a policy changes, the diff appears at
    atom level.

Why Grammar + Sidecar Unlocks Compare, Query, Automate

  1. Compare
    at scale

    • “This
      year Payer A replaced cnid_bmi_30_ge with (cnid_bmi_27_ge AND
      dx_icd10_e11).”
    • “Only
      3 of 14 plans require doc_lifestyle_program_6mo_ge for first‑line
      therapy.”
  2. Query
    the Policy Vault (the “magic”)

    • “Show
      all policies using A1c thresholds > 7% or requiring two oral
      agents
      within 120 days.”
    • “Find
      rules where negative conditions (NOT contraindications) drive
      denials.”
  3. Automate,
    but keep it explainable

    • Pre‑Check
      bundles
      : Likely / Borderline / Unlikely CNIDs per plan‑drug-indication
      pairing.
    • Doc
      Pack
      : Auto‑generated appeal packets and “missing evidence” flags.
    • Transparent
      outcomes
      : Atom‑level pass/fail with the exact policy clause link.

Implementation Blueprint

  1. Controlled
    vocabulary
    — Canonical concepts, operators, units, and code systems.
  2. Atom
    registry
    — IDs, definitions, examples, evidence fields, versions,
    effective dates.
  3. Authoring
    & linting
    — A policy editor that composes atoms, flags missing
    units, unknown codes, dangling references.
  4. Evaluation
    engine
    — Strong typing, unit coercion, explicit handling of unknown.
  5. Diffs
    & alerts
    — Rule‑ and atom‑level change detection with subscriber
    notifications.
  6. APIs
    — GET /atoms (or as I call them, Clinical Node IDs), GET /rules, POST
    /evaluate returning verdicts and explanations.

Quality You Can Measure (and Promise)

Auditability KPIs

  • Traceability:
    Every claim retains {plan, uid, clause_id, statement_hash, pdf_url}.
  • Snapshot
    integrity
    : Answers tie to a specific policy version; clause and rule
    hashes match the plan × drug × indication at first pass.
  • Drift
    tracking
    : Alerts when thresholds/time windows change, with impact
    analysis.
  • Reviewer
    agreement
    : Inter‑rater reliability on Sidecar approvals.

Worked GLP‑1 Example (bridge to Article 3)

Draft claim: “Plan U requires two oral agents within 90
days
and A1c ≥ 7% for initial GLP‑1 approval; renewal needs ≥
1% A1c reduction.”

Sidecar check items

  • Step
    therapy atoms present? cnid_step_any_two_orals_n2, window_days=90
  • Lab
    threshold atom present? cnid_lab_a1c_ge_7_90d
  • Renewal
    atom present? cnid_renewal_a1c_drop_ge_1
  • Scope
    correct? (GLP‑1, adult T2D)
  • Citations
    present? Show exact clause lines; verify no contradicting text.

Decision: Approve or edit (e.g., step window is 120
days, not 90).
Outcome: Ships only when every atom is backed by a current snapshot
citation.

In Article 3, we’ll “mint” the first 25 CNIDs for GLP‑1
coverage—proof that the grammar is practical, testable, and reusable.


FAQ

Isn’t this just guidelines in JSON?
No—this is policy‑as‑code with provenance. Drug policy changes
continuously. We need typed atoms, normalized units/codes, explicit operators,
and versioned logic, each tied to the source clause.

How do you handle ambiguity and negatives?
We model them explicitly (e.g., NOT cnid_contraindication_any) and flag
ambiguous language for reviewer action. The QA Sidecar enforces a pass/fail
rubric by licensed clinicians

Will this replace PDFs?
The PDF remains the legal narrative. The grammar is the operational
substrate
that makes policies comparable, queryable, and automatable..

Request a 10-Minute Walkthrough

Want to see the HITL QA Sidecar verify a GLP-1 policy end-to-end? We’ll review citations, atoms, diffs, and the evaluation output live.

✉️ Email to Request the Demo
Andrew Vargas, PharmD

About the Author

Andrew Vargas, PharmD is a Clinical Coding Pharmacist and founder of Pharmacist Write, where he translates managed-care and GLP-1 policy into practical insights for patients and professionals.

🧠 Read full bio · View all articles