Open IP Council seal Open IP Council
Open IP Council · Benchmarks · ValueBench
Document · Methodology · ValueBench v0.1 review · 2026-04-29

ValueBench.

Patent valuation benchmark. Comparables-driven price estimation against the transaction record, plus expert-panel calibration. Provisional pending public-comment close in June 2026.

Provisional methodology · public comment opens June 2026. Valuation is the most contested workflow in this benchmark suite. The fair-market price of a patent is structurally underdetermined: comparables are noisy, expert panels disagree, and litigation outcomes can swing valuation by orders of magnitude. ValueBench publishes both Track A and Track B separately for this reason. The 60-day public comment opens June 2026; methodology revisions adopted before v1.0.

§ 1The Problem

Patent valuation is the workflow most distorted by AI hype and most consequential to misprice. M&A diligence, ABL collateralization, fund NAV reporting, and post-grant licensing all depend on someone producing a defensible price. Vendors selling AI valuation are now operating without independent validation against the transaction record.

The risk: a buyer underwrites an acquisition or a fund-marks a portfolio on a number generated by a tool whose comparables retrieval, weighting logic, and expert calibration are opaque. When the number turns out to be wrong, there is no methodology trail to defend against an SEC inquiry, a Delaware fiduciary-duty claim, or a Daubert challenge in a damages dispute.

ValueBench scores AI valuation tools against (a) the transaction record where it exists and is public, and (b) a panel of practicing licensing executives.

§ 2Domains

Four valuation domains, each with distinct comparables sources and rubric weights. Domain weights apply only to the merged composite; per-track scores publish separately.

DomainWhat it measuresWeight
Single-asset valuationFair-market price for one patent given its claim set, prosecution history, family extent, and remaining term.35%
Portfolio valuationAggregate value of a portfolio with consideration for clustering, overlap, geographic coverage, and assertion history.30%
License-rate estimationReasonable royalty rate for a defined product/use given comparable license rates and Georgia-Pacific factors.25%
Disclosure integrityMandatory disclosure of comparables used, weighting applied, sources cited. Failure to disclose triggers floor.10%

§ 3Difficulty Tiers

Five tiers calibrated to valuation contexts a licensing executive or M&A diligence team encounters.

TierContextWhat it testsHuman MAPE
Tier 1Direct comparableThe asset has near-identical recent transactions in the same technology area; valuation is interpolation.~15%
Tier 2Adjacent comparableComparables exist in adjacent fields requiring documented adjustment factors.~25%
Tier 3Sparse comparablesFew public comparables; expert calibration drives the answer; Track B becomes load-bearing.~40%
Tier 4Litigation-affectedAsset has assertion history, PTAB challenge outcomes, or pending district-court damages exposure.~55%
Tier 5Standards / SEPsStandard-essential patents with FRAND obligations, declared / undeclared status, ETSI / IEEE / 3GPP comparables.~70%

Human-MAPE figures are calibration estimates from CBlindspot’s licensing-executive panel; final figures publish with v1.0. MAPE = mean absolute percentage error vs. final transaction price.

§ 4Four-Layer Scoring

ValueBench combines four scoring layers. Composite score equals the weighted sum, except where a hard floor applies (§ 7).

Layer 1

Comparables retrieval

Did the tool retrieve genuinely-relevant comparables from the transaction record? Scored by overlap with the panel-curated comparable set per case, weighted by transaction recency and technology proximity.

25%In progress
Layer 2

Track A — Transaction-record price

Ground-truth signal. For cases with public transaction prices (USPTO Patent Assignment recordings with consideration disclosed, public M&A purchase price allocations, court-ordered sales), the AI-estimated price is scored against the actual price. MAPE is the headline metric.

30%Planned
Layer 3

Track B — Licensing-executive panel

Expert signal. A panel of practicing licensing executives scores the AI’s valuation logic against the rubrics in § 5. Inter-rater reliability publishes per release.

35%Recruiting
Layer 4

Disclosure integrity

Mandatory disclosure: every comparable used, the weight applied, the data source. Citations of nonexistent transactions, comparables, or licensing rates trigger a hard floor of 0.0. See § 7.

10% (or floor)Live

§ 5Evaluation Rubrics

Valuation Logic Rubric

  • Comparable selection — were the comparables genuinely-relevant and recent? (2.0× weight)
  • Adjustment factors — were industry-standard adjustments applied (size, time, technology proximity, geography)? (1.5×)
  • Georgia-Pacific awareness — for license-rate estimation, are the 15 GP factors addressed where relevant? (1.5×)
  • Sensitivity analysis — does the tool show how the answer changes with assumption shifts? (1.0×)
  • Defensibility — would this valuation survive a Daubert / SEC / fiduciary review? (2.0×)

Reporting Quality Rubric

  • Source disclosure — every comparable cited, source identified, version/date stamped. (2.0×)
  • Methodology transparency — weighting rationale articulated. (1.5×)
  • Uncertainty quantification — output includes a range, not just a point estimate. (1.5×)
  • Reproducibility — another analyst with the same inputs and method reaches a value within the disclosed range. (1.0×)

§ 6Reproducibility (Glass Box)

Per Charter § 8.3, every published ValueBench run is reproducible by third parties within a 5% scoring delta. Same five pillars as DraftBench:

  1. Test set publication. Cases publish one release cycle after initial scoring. Held-out and confidential-comparable sets are documented; redaction protocols disclosed.
  2. Rubric transparency. Both rubrics are public, versioned, and annotated with calibration examples.
  3. Output availability. Vendor outputs (with consent) and panel scores publish under Apache-2.0. Both successful and failed valuations published.
  4. Failure-mode analysis. Each scored run includes a failure-mode taxonomy (fabricated comparable, wrong industry adjustment, missing GP factor analysis, no uncertainty range, etc.).
  5. Continuous reporting. Public leaderboard updates monthly. Methodology revisions go through Methodology Council review per Charter § 7.2.

§ 7Hallucination Floor (Therasense Standard)

Doctrinal basis

Therasense, Inc. v. Becton, Dickinson & Co., 649 F.3d 1276 (Fed. Cir. 2011) (en banc). For valuation contexts, ValueBench extends the Therasense floor beyond cited prior art to cited comparables and cited license rates: a fabricated comparable transaction or fabricated license-rate citation is materially indistinguishable from a fabricated case citation.

Any cited transaction (USPTO Patent Assignment record, M&A filing, court-ordered sale, public license disclosure) must verify against the authoritative source. Any cited license rate must trace to a public license, an analyst report, or a documented disclosure. Verification failure triggers a hard floor: composite equals 0.0.

This is not a deduction. A valuation built on one fabricated comparable does not earn 80%; it earns 0%. The integrity-floor mechanic is doctrinally analogous to inequitable conduct: catastrophic misrepresentation is not averaged into a passing grade.

§ 8Data Repository

The ValueBench repository will publish at github.com/cblindspot/valuebench at provisional accreditation. Planned contents:

  • METHODOLOGY.md — this document, versioned.
  • cases/public/ — cases with public transaction prices (Track A scoring); each case carries the asset description, comparables set, and final price.
  • cases/held-out/ — the held-out set documented but not published until next release cycle.
  • rubrics/valuation_logic.json, rubrics/reporting_quality.json — full rubric specifications.
  • panel_scores/ — per-release panel ratings (anonymized executive ID, rubric breakdown, free-text rationale).
  • vendor_outputs/ — vendor-submitted outputs with their consent. Apache-2.0.
  • harness/ — the evaluation harness (transaction-record matching, comparable-verification client, panel-scoring intake).
  • leaderboard.md — auto-generated rolling results.

§ 9Governance

ValueBench is owned by CBlindspot. Per Charter § 2 principle 2, the Open IP Council does not own this benchmark; it accredits it (provisional, pending public comment).

Provisional accreditation status: ValueBench enters the registry as provisional per Charter § 6. After the 60-day public comment closes (June 2026), the Methodology Council reviews and votes on full accreditation by 2/3 supermajority.

No single AI vendor, law firm, or licensing platform holds majority influence over ValueBench’s test set, rubrics, or panel composition. CBlindspot publishes its conflict-of-interest disclosures per Charter § 8.2.

§ 10Get Involved

  • Vendors — submit your tool for provisional evaluation. Outputs publish under Apache-2.0 with your written consent.
  • Licensing executives — apply to the Track B panel. 5+ years of in-house licensing or law-firm transactions experience required.
  • Buyers / fund managers / auditors — cite ValueBench accreditation in your valuation procurement / vendor-selection criteria.
  • Academics — the Methodology Council welcomes peer-review papers analyzing valuation calibration data and rubric construct validity.
  • Public comment — methodology objections via the OIPC public-comment process. Subject line should include VALUEBENCH-METHODOLOGY-v0.1.
OIPC-VALUEBENCH-METHODOLOGY-v0.1 · Last reviewed 2026-04-29 · Provisional Charter v0.1 · All benchmarks