§ 1The Problem
Patent valuation is the workflow most distorted by AI hype and most consequential to misprice. M&A diligence, ABL collateralization, fund NAV reporting, and post-grant licensing all depend on someone producing a defensible price. Vendors selling AI valuation are now operating without independent validation against the transaction record.
The risk: a buyer underwrites an acquisition or a fund-marks a portfolio on a number generated by a tool whose comparables retrieval, weighting logic, and expert calibration are opaque. When the number turns out to be wrong, there is no methodology trail to defend against an SEC inquiry, a Delaware fiduciary-duty claim, or a Daubert challenge in a damages dispute.
ValueBench scores AI valuation tools against (a) the transaction record where it exists and is public, and (b) a panel of practicing licensing executives.
§ 2Domains
Four valuation domains, each with distinct comparables sources and rubric weights. Domain weights apply only to the merged composite; per-track scores publish separately.
| Domain | What it measures | Weight |
|---|---|---|
| Single-asset valuation | Fair-market price for one patent given its claim set, prosecution history, family extent, and remaining term. | 35% |
| Portfolio valuation | Aggregate value of a portfolio with consideration for clustering, overlap, geographic coverage, and assertion history. | 30% |
| License-rate estimation | Reasonable royalty rate for a defined product/use given comparable license rates and Georgia-Pacific factors. | 25% |
| Disclosure integrity | Mandatory disclosure of comparables used, weighting applied, sources cited. Failure to disclose triggers floor. | 10% |
§ 3Difficulty Tiers
Five tiers calibrated to valuation contexts a licensing executive or M&A diligence team encounters.
| Tier | Context | What it tests | Human MAPE |
|---|---|---|---|
| Tier 1 | Direct comparable | The asset has near-identical recent transactions in the same technology area; valuation is interpolation. | ~15% |
| Tier 2 | Adjacent comparable | Comparables exist in adjacent fields requiring documented adjustment factors. | ~25% |
| Tier 3 | Sparse comparables | Few public comparables; expert calibration drives the answer; Track B becomes load-bearing. | ~40% |
| Tier 4 | Litigation-affected | Asset has assertion history, PTAB challenge outcomes, or pending district-court damages exposure. | ~55% |
| Tier 5 | Standards / SEPs | Standard-essential patents with FRAND obligations, declared / undeclared status, ETSI / IEEE / 3GPP comparables. | ~70% |
Human-MAPE figures are calibration estimates from CBlindspot’s licensing-executive panel; final figures publish with v1.0. MAPE = mean absolute percentage error vs. final transaction price.
§ 4Four-Layer Scoring
ValueBench combines four scoring layers. Composite score equals the weighted sum, except where a hard floor applies (§ 7).
Comparables retrieval
Did the tool retrieve genuinely-relevant comparables from the transaction record? Scored by overlap with the panel-curated comparable set per case, weighted by transaction recency and technology proximity.
Track A — Transaction-record price
Ground-truth signal. For cases with public transaction prices (USPTO Patent Assignment recordings with consideration disclosed, public M&A purchase price allocations, court-ordered sales), the AI-estimated price is scored against the actual price. MAPE is the headline metric.
Track B — Licensing-executive panel
Expert signal. A panel of practicing licensing executives scores the AI’s valuation logic against the rubrics in § 5. Inter-rater reliability publishes per release.
Disclosure integrity
Mandatory disclosure: every comparable used, the weight applied, the data source. Citations of nonexistent transactions, comparables, or licensing rates trigger a hard floor of 0.0. See § 7.
§ 5Evaluation Rubrics
Valuation Logic Rubric
- Comparable selection — were the comparables genuinely-relevant and recent? (2.0× weight)
- Adjustment factors — were industry-standard adjustments applied (size, time, technology proximity, geography)? (1.5×)
- Georgia-Pacific awareness — for license-rate estimation, are the 15 GP factors addressed where relevant? (1.5×)
- Sensitivity analysis — does the tool show how the answer changes with assumption shifts? (1.0×)
- Defensibility — would this valuation survive a Daubert / SEC / fiduciary review? (2.0×)
Reporting Quality Rubric
- Source disclosure — every comparable cited, source identified, version/date stamped. (2.0×)
- Methodology transparency — weighting rationale articulated. (1.5×)
- Uncertainty quantification — output includes a range, not just a point estimate. (1.5×)
- Reproducibility — another analyst with the same inputs and method reaches a value within the disclosed range. (1.0×)
§ 6Reproducibility (Glass Box)
Per Charter § 8.3, every published ValueBench run is reproducible by third parties within a 5% scoring delta. Same five pillars as DraftBench:
- Test set publication. Cases publish one release cycle after initial scoring. Held-out and confidential-comparable sets are documented; redaction protocols disclosed.
- Rubric transparency. Both rubrics are public, versioned, and annotated with calibration examples.
- Output availability. Vendor outputs (with consent) and panel scores publish under Apache-2.0. Both successful and failed valuations published.
- Failure-mode analysis. Each scored run includes a failure-mode taxonomy (fabricated comparable, wrong industry adjustment, missing GP factor analysis, no uncertainty range, etc.).
- Continuous reporting. Public leaderboard updates monthly. Methodology revisions go through Methodology Council review per Charter § 7.2.
§ 7Hallucination Floor (Therasense Standard)
Therasense, Inc. v. Becton, Dickinson & Co., 649 F.3d 1276 (Fed. Cir. 2011) (en banc). For valuation contexts, ValueBench extends the Therasense floor beyond cited prior art to cited comparables and cited license rates: a fabricated comparable transaction or fabricated license-rate citation is materially indistinguishable from a fabricated case citation.
Any cited transaction (USPTO Patent Assignment record, M&A filing, court-ordered sale, public license disclosure) must verify against the authoritative source. Any cited license rate must trace to a public license, an analyst report, or a documented disclosure. Verification failure triggers a hard floor: composite equals 0.0.
This is not a deduction. A valuation built on one fabricated comparable does not earn 80%; it earns 0%. The integrity-floor mechanic is doctrinally analogous to inequitable conduct: catastrophic misrepresentation is not averaged into a passing grade.
§ 8Data Repository
The ValueBench repository will publish at github.com/cblindspot/valuebench at provisional accreditation. Planned contents:
METHODOLOGY.md— this document, versioned.cases/public/— cases with public transaction prices (Track A scoring); each case carries the asset description, comparables set, and final price.cases/held-out/— the held-out set documented but not published until next release cycle.rubrics/valuation_logic.json,rubrics/reporting_quality.json— full rubric specifications.panel_scores/— per-release panel ratings (anonymized executive ID, rubric breakdown, free-text rationale).vendor_outputs/— vendor-submitted outputs with their consent. Apache-2.0.harness/— the evaluation harness (transaction-record matching, comparable-verification client, panel-scoring intake).leaderboard.md— auto-generated rolling results.
§ 9Governance
ValueBench is owned by CBlindspot. Per Charter § 2 principle 2, the Open IP Council does not own this benchmark; it accredits it (provisional, pending public comment).
Provisional accreditation status: ValueBench enters the registry as provisional per Charter § 6. After the 60-day public comment closes (June 2026), the Methodology Council reviews and votes on full accreditation by 2/3 supermajority.
No single AI vendor, law firm, or licensing platform holds majority influence over ValueBench’s test set, rubrics, or panel composition. CBlindspot publishes its conflict-of-interest disclosures per Charter § 8.2.
§ 10Get Involved
- Vendors — submit your tool for provisional evaluation. Outputs publish under Apache-2.0 with your written consent.
- Licensing executives — apply to the Track B panel. 5+ years of in-house licensing or law-firm transactions experience required.
- Buyers / fund managers / auditors — cite ValueBench accreditation in your valuation procurement / vendor-selection criteria.
- Academics — the Methodology Council welcomes peer-review papers analyzing valuation calibration data and rubric construct validity.
- Public comment — methodology objections via the OIPC public-comment process. Subject line should include
VALUEBENCH-METHODOLOGY-v0.1.