§ 1The Problem
Drafting is the highest-leverage step in the patent lifecycle. A weak specification can’t be fixed by a strong prosecution. AI tools that draft specifications and claim sets are entering the market without a reproducible measure of whether their output survives prosecution.
Vendors today benchmark drafting on internal datasets with internal rubrics. There is no neutral way for an in-house IP team or a law-firm partner to compare how DeepIP, Solve Intelligence, IP Author, Rowan, or a frontier LLM perform on the drafting task as it actually plays out in front of a USPTO examiner.
DraftBench fills that gap. It scores AI-drafted specifications and claim sets against (a) the prosecution outcome of the same or analog application and (b) a panel of registered practitioners.
§ 2Domains
DraftBench evaluates four drafting domains. Domain weights apply to the merged composite; per-track scores publish separately per § 4.
| Domain | What it measures | Weight |
|---|---|---|
| Specification quality | Enablement, written-description, best-mode integrity. Whether the spec carries the claim through § 112 challenges. | 35% |
| Claim-set coherence | Independent / dependent structure, antecedent basis, scope ladder, breadth-vs-novelty trade-off. | 30% |
| Written-description support | Every claim limitation traces to a specification disclosure with sufficient ipsis verbis or constructive support. | 25% |
| Citation integrity | Cited prior art (in IDS, in spec background, in claim-construction support) verifies against the authoritative source. Therasense floor applies. | 10% |
§ 3Difficulty Tiers
Five tiers calibrated to the drafting complexity human practitioners encounter. Tier counts in v2026.1 are provisional pending CBlindspot’s test-set release.
| Tier | Practitioner level | What it tests | Human baseline |
|---|---|---|---|
| Tier 1 | Deterministic | Mechanical drafting tasks: claim numbering, dependency formatting, antecedent-basis fixes. | 100% |
| Tier 2 | Paralegal | Spec section assembly from invention disclosures, dependent claim generation from a single independent. | ~92% |
| Tier 3 | Junior associate | Independent-claim drafting from a structured invention, basic written-description integration. | ~78% |
| Tier 4 | Senior associate | Multi-embodiment specs, claim ladders that anticipate restriction practice, § 112(f) means-plus-function strategy. | ~70% |
| Tier 5 | Partner / strategist | Cross-jurisdictional drafting, claim-construction-aware drafting, freedom-to-operate-aware claim narrowing. | ~55% |
Human-baseline figures are calibration estimates from CBlindspot’s practitioner panel; final figures publish with v1.0.
§ 4Four-Layer Scoring
DraftBench combines four scoring layers. Composite score equals the weighted sum, except where a hard floor applies (§ 7).
Deterministic checks
Objectively verifiable answers: claim count compliance, dependency well-formedness, antecedent-basis pass, MPEP-formatted abstract length, drawing-reference numerical consistency.
Track A — Prosecution outcome
Ground-truth signal. Where the input invention disclosure has a real-world counterpart in the USPTO record, the AI-drafted spec/claims are scored against the actual prosecution outcome of that or an analog application: allowance vs. rejection, § 112 rejections issued, claim amendments forced, final allowed scope.
Track B — Practitioner panel
Expert signal. A panel of USPTO-registered practitioners (5+ years of drafting experience) scores AI-drafted output against the rubrics in § 5. Inter-rater reliability is published per release; outlier scores are reviewed and re-rated.
Citation integrity (Therasense floor)
Any cited prior art — in the IDS, in the background section, or in claim-construction-support arguments — is verified against USPTO PatentsView or the Patent Public Search API. Verification failure triggers a hard floor: composite equals 0.0 for that draft. Not a deduction. See § 7.
§ 5Evaluation Rubrics
Two rubrics govern Track B scoring. Both publish with v1.0; weights below are provisional.
Specification Quality Rubric
- Enablement — can a person of ordinary skill make and use the invention from the spec? (1.5× weight)
- Written description — does the spec demonstrate possession of the claimed subject matter? (1.5×)
- Best mode — is the inventor’s preferred embodiment disclosed? (1.0×)
- Definiteness foundation — does the spec define terms used in the claims with sufficient precision? (1.0×)
Claim Quality Rubric
- Scope ladder — does the dependent-claim structure provide a defensible fall-back position? (2.0×)
- Breadth vs. novelty — are the claims as broad as the disclosure permits without reading on the prior art? (1.5×)
- Antecedent basis — is every term properly introduced? (1.0×)
- Restriction resilience — do the claim sets anticipate likely restriction practice? (1.5×)
- Professional quality — readability, consistency, style. (0.5×)
§ 6Reproducibility (Glass Box)
Per Charter § 8.3, every published DraftBench run is reproducible by third parties within a 5% scoring delta. Five pillars apply:
- Test set publication. The full test set publishes one release cycle after initial scoring (to prevent training contamination). Held-out sets are documented.
- Rubric transparency. Both rubrics are public, versioned, and annotated with calibration examples.
- Output availability. Vendor outputs (with consent) and panel scores publish under Apache-2.0. Successes and failures both published.
- Failure-mode analysis. Each scored run includes a failure-mode taxonomy (fabricated citation, missing antecedent basis, overbroad claim, missing best mode, etc.).
- Continuous reporting. Public leaderboard updates monthly. Methodology revisions go through Methodology Council review per Charter § 7.2.
§ 7Hallucination Floor (Therasense Standard)
Therasense, Inc. v. Becton, Dickinson & Co., 649 F.3d 1276 (Fed. Cir. 2011) (en banc), establishing affirmative egregious misconduct — including citation of nonexistent prior art — as per se material under the inequitable conduct doctrine.
Any cited US patent or publication number in an AI-drafted output (IDS, spec background, claim-construction-support argument) must verify against an authoritative source — USPTO PatentsView API, USPTO Patent Public Search, or the relevant jurisdictional equivalent (EPO Open Patent Services, J-PlatPat, CNIPA Patent Search). Citations that fail verification trigger a hard floor: composite equals 0.0.
This is not a deduction. A draft with one fabricated citation does not earn 95%; it earns 0%. Catastrophic conduct is not averaged into a passing grade.
The Therasense Standard is non-negotiable for OIPC accreditation. Maintainers may not relax it.
§ 8Data Repository
The DraftBench repository will publish at github.com/cblindspot/draftbench at OIPC accreditation. Planned contents:
METHODOLOGY.md— this document, versioned.test_set.jsonl— invention disclosures (input) and target outcomes (Track A ground truth).rubrics/spec_quality.json,rubrics/claim_quality.json— full rubric specifications.panel_scores/— per-release panel ratings (anonymized practitioner ID, rubric breakdown, free-text rationale).vendor_outputs/— vendor-submitted outputs with their consent. Apache-2.0.harness/— the evaluation harness (Track A scoring against USPTO outcomes, citation-verification client, panel-scoring intake).leaderboard.md— auto-generated rolling results.
§ 9Governance
DraftBench is owned by CBlindspot. Per Charter § 2 principle 2, the Open IP Council does not own this benchmark; it accredits it. Methodology decisions sit with CBlindspot and any future co-maintainers; methodology disputes between maintainers go to the OIPC Methodology Council per Charter § 7.2.
No single AI vendor, law firm, or platform holds majority influence over DraftBench’s test set, rubrics, or panel composition. CBlindspot publishes its conflict-of-interest disclosures per Charter § 8.2.
Annual reaccreditation per Charter § 6: CBlindspot publishes a conformance report each calendar year. Failure to publish triggers a 90-day cure period before de-accreditation.
§ 10Get Involved
- Vendors — submit your tool for evaluation. Outputs publish under Apache-2.0 with your written consent.
- Practitioners — apply to the Track B panel. USPTO registration and 5+ years drafting experience required.
- Buyers — cite DraftBench accreditation as a procurement requirement in your AI-tool RFPs.
- Academics — the Methodology Council welcomes peer-review papers analyzing DraftBench results, calibration data, and rubric construct validity.
- Public comment — methodology objections via the OIPC public-comment process. Subject line should include
DRAFTBENCH-METHODOLOGY-v0.1.