DraftBench — Methodology — Open IP Council

§ 1The Problem

Drafting is the highest-leverage step in the patent lifecycle. A weak specification can’t be fixed by a strong prosecution. AI tools that draft specifications and claim sets are entering the market without a reproducible measure of whether their output survives prosecution.

Vendors today benchmark drafting on internal datasets with internal rubrics. There is no neutral way for an in-house IP team or a law-firm partner to compare how DeepIP, Solve Intelligence, IP Author, Rowan, or a frontier LLM perform on the drafting task as it actually plays out in front of a USPTO examiner.

DraftBench fills that gap. It scores AI-drafted specifications and claim sets against (a) the prosecution outcome of the same or analog application and (b) a panel of registered practitioners.

§ 2Domains

DraftBench evaluates four drafting domains. Domain weights apply to the merged composite; per-track scores publish separately per § 4.

Domain	What it measures	Weight
Specification quality	Enablement, written-description, best-mode integrity. Whether the spec carries the claim through § 112 challenges.	35%
Claim-set coherence	Independent / dependent structure, antecedent basis, scope ladder, breadth-vs-novelty trade-off.	30%
Written-description support	Every claim limitation traces to a specification disclosure with sufficient ipsis verbis or constructive support.	25%
Citation integrity	Cited prior art (in IDS, in spec background, in claim-construction support) verifies against the authoritative source. Therasense floor applies.	10%

§ 3Difficulty Tiers

Five tiers calibrated to the drafting complexity human practitioners encounter. Tier counts in v2026.1 are provisional pending CBlindspot’s test-set release.

Tier	Practitioner level	What it tests	Human baseline
Tier 1	Deterministic	Mechanical drafting tasks: claim numbering, dependency formatting, antecedent-basis fixes.	100%
Tier 2	Paralegal	Spec section assembly from invention disclosures, dependent claim generation from a single independent.	~92%
Tier 3	Junior associate	Independent-claim drafting from a structured invention, basic written-description integration.	~78%
Tier 4	Senior associate	Multi-embodiment specs, claim ladders that anticipate restriction practice, § 112(f) means-plus-function strategy.	~70%
Tier 5	Partner / strategist	Cross-jurisdictional drafting, claim-construction-aware drafting, freedom-to-operate-aware claim narrowing.	~55%

Human-baseline figures are calibration estimates from CBlindspot’s practitioner panel; final figures publish with v1.0.

§ 4Four-Layer Scoring

DraftBench combines four scoring layers. Composite score equals the weighted sum, except where a hard floor applies (§ 7).

Layer 1

Deterministic checks

Objectively verifiable answers: claim count compliance, dependency well-formedness, antecedent-basis pass, MPEP-formatted abstract length, drawing-reference numerical consistency.

15%Live

Layer 2

Track A — Prosecution outcome

Ground-truth signal. Where the input invention disclosure has a real-world counterpart in the USPTO record, the AI-drafted spec/claims are scored against the actual prosecution outcome of that or an analog application: allowance vs. rejection, § 112 rejections issued, claim amendments forced, final allowed scope.

40%In progress

Layer 3

Track B — Practitioner panel

Expert signal. A panel of USPTO-registered practitioners (5+ years of drafting experience) scores AI-drafted output against the rubrics in § 5. Inter-rater reliability is published per release; outlier scores are reviewed and re-rated.

35%Recruiting

Layer 4

Citation integrity (Therasense floor)

Any cited prior art — in the IDS, in the background section, or in claim-construction-support arguments — is verified against USPTO PatentsView or the Patent Public Search API. Verification failure triggers a hard floor: composite equals 0.0 for that draft. Not a deduction. See § 7.

10% (or floor)Live

§ 5Evaluation Rubrics

Two rubrics govern Track B scoring. Both publish with v1.0; weights below are provisional.

Specification Quality Rubric

Enablement — can a person of ordinary skill make and use the invention from the spec? (1.5× weight)
Written description — does the spec demonstrate possession of the claimed subject matter? (1.5×)
Best mode — is the inventor’s preferred embodiment disclosed? (1.0×)
Definiteness foundation — does the spec define terms used in the claims with sufficient precision? (1.0×)

Claim Quality Rubric

Scope ladder — does the dependent-claim structure provide a defensible fall-back position? (2.0×)
Breadth vs. novelty — are the claims as broad as the disclosure permits without reading on the prior art? (1.5×)
Antecedent basis — is every term properly introduced? (1.0×)
Restriction resilience — do the claim sets anticipate likely restriction practice? (1.5×)
Professional quality — readability, consistency, style. (0.5×)

§ 6Reproducibility (Glass Box)

Per Charter § 8.3, every published DraftBench run is reproducible by third parties within a 5% scoring delta. Five pillars apply:

Test set publication. The full test set publishes one release cycle after initial scoring (to prevent training contamination). Held-out sets are documented.
Rubric transparency. Both rubrics are public, versioned, and annotated with calibration examples.
Output availability. Vendor outputs (with consent) and panel scores publish under Apache-2.0. Successes and failures both published.
Failure-mode analysis. Each scored run includes a failure-mode taxonomy (fabricated citation, missing antecedent basis, overbroad claim, missing best mode, etc.).
Continuous reporting. Public leaderboard updates monthly. Methodology revisions go through Methodology Council review per Charter § 7.2.

§ 7Hallucination Floor (Therasense Standard)

Doctrinal basis

Therasense, Inc. v. Becton, Dickinson & Co., 649 F.3d 1276 (Fed. Cir. 2011) (en banc), establishing affirmative egregious misconduct — including citation of nonexistent prior art — as per se material under the inequitable conduct doctrine.

Any cited US patent or publication number in an AI-drafted output (IDS, spec background, claim-construction-support argument) must verify against an authoritative source — USPTO PatentsView API, USPTO Patent Public Search, or the relevant jurisdictional equivalent (EPO Open Patent Services, J-PlatPat, CNIPA Patent Search). Citations that fail verification trigger a hard floor: composite equals 0.0.

This is not a deduction. A draft with one fabricated citation does not earn 95%; it earns 0%. Catastrophic conduct is not averaged into a passing grade.

The Therasense Standard is non-negotiable for OIPC accreditation. Maintainers may not relax it.

§ 8Data Repository

The DraftBench repository will publish at github.com/cblindspot/draftbench at OIPC accreditation. Planned contents:

METHODOLOGY.md — this document, versioned.
test_set.jsonl — invention disclosures (input) and target outcomes (Track A ground truth).
rubrics/spec_quality.json, rubrics/claim_quality.json — full rubric specifications.
panel_scores/ — per-release panel ratings (anonymized practitioner ID, rubric breakdown, free-text rationale).
vendor_outputs/ — vendor-submitted outputs with their consent. Apache-2.0.
harness/ — the evaluation harness (Track A scoring against USPTO outcomes, citation-verification client, panel-scoring intake).
leaderboard.md — auto-generated rolling results.

§ 9Governance

DraftBench is owned by CBlindspot. Per Charter § 2 principle 2, the Open IP Council does not own this benchmark; it accredits it. Methodology decisions sit with CBlindspot and any future co-maintainers; methodology disputes between maintainers go to the OIPC Methodology Council per Charter § 7.2.

No single AI vendor, law firm, or platform holds majority influence over DraftBench’s test set, rubrics, or panel composition. CBlindspot publishes its conflict-of-interest disclosures per Charter § 8.2.

Annual reaccreditation per Charter § 6: CBlindspot publishes a conformance report each calendar year. Failure to publish triggers a 90-day cure period before de-accreditation.

§ 10Get Involved

Vendors — submit your tool for evaluation. Outputs publish under Apache-2.0 with your written consent.
Practitioners — apply to the Track B panel. USPTO registration and 5+ years drafting experience required.
Buyers — cite DraftBench accreditation as a procurement requirement in your AI-tool RFPs.
Academics — the Methodology Council welcomes peer-review papers analyzing DraftBench results, calibration data, and rubric construct validity.
Public comment — methodology objections via the OIPC public-comment process. Subject line should include DRAFTBENCH-METHODOLOGY-v0.1.

Submit a tool → File a comment →

OIPC-DRAFTBENCH-METHODOLOGY-v0.1 · Last reviewed 2026-04-29 Charter v0.1 · All benchmarks