Open IP Council seal Open IP Council
Open IP Council · Benchmarks · DraftBench
Document · Methodology · DraftBench v2026.1 · 2026-04-29

DraftBench.

Pre-filing patent drafting benchmark. Dual-track scoring: Track A ground truth from prosecution outcomes, Track B from a registered practitioner panel. Hard floor on cited-reference fabrication under the Therasense Standard.

Draft methodology. This document is the OIPC Methodology Council’s working draft of the DraftBench methodology, modeled on the PatentBench template. It is published for public comment alongside Charter v0.1. Specific test counts, baseline scores, and scoring weights are CBlindspot’s to set; placeholders below are provisional and clearly marked. Final values land in v1.0 alongside CBlindspot’s public methodology repository.

§ 1The Problem

Drafting is the highest-leverage step in the patent lifecycle. A weak specification can’t be fixed by a strong prosecution. AI tools that draft specifications and claim sets are entering the market without a reproducible measure of whether their output survives prosecution.

Vendors today benchmark drafting on internal datasets with internal rubrics. There is no neutral way for an in-house IP team or a law-firm partner to compare how DeepIP, Solve Intelligence, IP Author, Rowan, or a frontier LLM perform on the drafting task as it actually plays out in front of a USPTO examiner.

DraftBench fills that gap. It scores AI-drafted specifications and claim sets against (a) the prosecution outcome of the same or analog application and (b) a panel of registered practitioners.

§ 2Domains

DraftBench evaluates four drafting domains. Domain weights apply to the merged composite; per-track scores publish separately per § 4.

DomainWhat it measuresWeight
Specification qualityEnablement, written-description, best-mode integrity. Whether the spec carries the claim through § 112 challenges.35%
Claim-set coherenceIndependent / dependent structure, antecedent basis, scope ladder, breadth-vs-novelty trade-off.30%
Written-description supportEvery claim limitation traces to a specification disclosure with sufficient ipsis verbis or constructive support.25%
Citation integrityCited prior art (in IDS, in spec background, in claim-construction support) verifies against the authoritative source. Therasense floor applies.10%

§ 3Difficulty Tiers

Five tiers calibrated to the drafting complexity human practitioners encounter. Tier counts in v2026.1 are provisional pending CBlindspot’s test-set release.

TierPractitioner levelWhat it testsHuman baseline
Tier 1DeterministicMechanical drafting tasks: claim numbering, dependency formatting, antecedent-basis fixes.100%
Tier 2ParalegalSpec section assembly from invention disclosures, dependent claim generation from a single independent.~92%
Tier 3Junior associateIndependent-claim drafting from a structured invention, basic written-description integration.~78%
Tier 4Senior associateMulti-embodiment specs, claim ladders that anticipate restriction practice, § 112(f) means-plus-function strategy.~70%
Tier 5Partner / strategistCross-jurisdictional drafting, claim-construction-aware drafting, freedom-to-operate-aware claim narrowing.~55%

Human-baseline figures are calibration estimates from CBlindspot’s practitioner panel; final figures publish with v1.0.

§ 4Four-Layer Scoring

DraftBench combines four scoring layers. Composite score equals the weighted sum, except where a hard floor applies (§ 7).

Layer 1

Deterministic checks

Objectively verifiable answers: claim count compliance, dependency well-formedness, antecedent-basis pass, MPEP-formatted abstract length, drawing-reference numerical consistency.

15%Live
Layer 2

Track A — Prosecution outcome

Ground-truth signal. Where the input invention disclosure has a real-world counterpart in the USPTO record, the AI-drafted spec/claims are scored against the actual prosecution outcome of that or an analog application: allowance vs. rejection, § 112 rejections issued, claim amendments forced, final allowed scope.

40%In progress
Layer 3

Track B — Practitioner panel

Expert signal. A panel of USPTO-registered practitioners (5+ years of drafting experience) scores AI-drafted output against the rubrics in § 5. Inter-rater reliability is published per release; outlier scores are reviewed and re-rated.

35%Recruiting
Layer 4

Citation integrity (Therasense floor)

Any cited prior art — in the IDS, in the background section, or in claim-construction-support arguments — is verified against USPTO PatentsView or the Patent Public Search API. Verification failure triggers a hard floor: composite equals 0.0 for that draft. Not a deduction. See § 7.

10% (or floor)Live

§ 5Evaluation Rubrics

Two rubrics govern Track B scoring. Both publish with v1.0; weights below are provisional.

Specification Quality Rubric

  • Enablement — can a person of ordinary skill make and use the invention from the spec? (1.5× weight)
  • Written description — does the spec demonstrate possession of the claimed subject matter? (1.5×)
  • Best mode — is the inventor’s preferred embodiment disclosed? (1.0×)
  • Definiteness foundation — does the spec define terms used in the claims with sufficient precision? (1.0×)

Claim Quality Rubric

  • Scope ladder — does the dependent-claim structure provide a defensible fall-back position? (2.0×)
  • Breadth vs. novelty — are the claims as broad as the disclosure permits without reading on the prior art? (1.5×)
  • Antecedent basis — is every term properly introduced? (1.0×)
  • Restriction resilience — do the claim sets anticipate likely restriction practice? (1.5×)
  • Professional quality — readability, consistency, style. (0.5×)

§ 6Reproducibility (Glass Box)

Per Charter § 8.3, every published DraftBench run is reproducible by third parties within a 5% scoring delta. Five pillars apply:

  1. Test set publication. The full test set publishes one release cycle after initial scoring (to prevent training contamination). Held-out sets are documented.
  2. Rubric transparency. Both rubrics are public, versioned, and annotated with calibration examples.
  3. Output availability. Vendor outputs (with consent) and panel scores publish under Apache-2.0. Successes and failures both published.
  4. Failure-mode analysis. Each scored run includes a failure-mode taxonomy (fabricated citation, missing antecedent basis, overbroad claim, missing best mode, etc.).
  5. Continuous reporting. Public leaderboard updates monthly. Methodology revisions go through Methodology Council review per Charter § 7.2.

§ 7Hallucination Floor (Therasense Standard)

Doctrinal basis

Therasense, Inc. v. Becton, Dickinson & Co., 649 F.3d 1276 (Fed. Cir. 2011) (en banc), establishing affirmative egregious misconduct — including citation of nonexistent prior art — as per se material under the inequitable conduct doctrine.

Any cited US patent or publication number in an AI-drafted output (IDS, spec background, claim-construction-support argument) must verify against an authoritative source — USPTO PatentsView API, USPTO Patent Public Search, or the relevant jurisdictional equivalent (EPO Open Patent Services, J-PlatPat, CNIPA Patent Search). Citations that fail verification trigger a hard floor: composite equals 0.0.

This is not a deduction. A draft with one fabricated citation does not earn 95%; it earns 0%. Catastrophic conduct is not averaged into a passing grade.

The Therasense Standard is non-negotiable for OIPC accreditation. Maintainers may not relax it.

§ 8Data Repository

The DraftBench repository will publish at github.com/cblindspot/draftbench at OIPC accreditation. Planned contents:

  • METHODOLOGY.md — this document, versioned.
  • test_set.jsonl — invention disclosures (input) and target outcomes (Track A ground truth).
  • rubrics/spec_quality.json, rubrics/claim_quality.json — full rubric specifications.
  • panel_scores/ — per-release panel ratings (anonymized practitioner ID, rubric breakdown, free-text rationale).
  • vendor_outputs/ — vendor-submitted outputs with their consent. Apache-2.0.
  • harness/ — the evaluation harness (Track A scoring against USPTO outcomes, citation-verification client, panel-scoring intake).
  • leaderboard.md — auto-generated rolling results.

§ 9Governance

DraftBench is owned by CBlindspot. Per Charter § 2 principle 2, the Open IP Council does not own this benchmark; it accredits it. Methodology decisions sit with CBlindspot and any future co-maintainers; methodology disputes between maintainers go to the OIPC Methodology Council per Charter § 7.2.

No single AI vendor, law firm, or platform holds majority influence over DraftBench’s test set, rubrics, or panel composition. CBlindspot publishes its conflict-of-interest disclosures per Charter § 8.2.

Annual reaccreditation per Charter § 6: CBlindspot publishes a conformance report each calendar year. Failure to publish triggers a 90-day cure period before de-accreditation.

§ 10Get Involved

  • Vendors — submit your tool for evaluation. Outputs publish under Apache-2.0 with your written consent.
  • Practitioners — apply to the Track B panel. USPTO registration and 5+ years drafting experience required.
  • Buyers — cite DraftBench accreditation as a procurement requirement in your AI-tool RFPs.
  • Academics — the Methodology Council welcomes peer-review papers analyzing DraftBench results, calibration data, and rubric construct validity.
  • Public comment — methodology objections via the OIPC public-comment process. Subject line should include DRAFTBENCH-METHODOLOGY-v0.1.
OIPC-DRAFTBENCH-METHODOLOGY-v0.1 · Last reviewed 2026-04-29 Charter v0.1 · All benchmarks