ValueBench — Methodology — Open IP Council

§ 1The Problem

Patent valuation is the workflow most distorted by AI hype and most consequential to misprice. M&A diligence, ABL collateralization, fund NAV reporting, and post-grant licensing all depend on someone producing a defensible price. Vendors selling AI valuation are now operating without independent validation against the transaction record.

The risk: a buyer underwrites an acquisition or a fund-marks a portfolio on a number generated by a tool whose comparables retrieval, weighting logic, and expert calibration are opaque. When the number turns out to be wrong, there is no methodology trail to defend against an SEC inquiry, a Delaware fiduciary-duty claim, or a Daubert challenge in a damages dispute.

ValueBench scores AI valuation tools against (a) the transaction record where it exists and is public, and (b) a panel of practicing licensing executives.

§ 2Domains

Four valuation domains, each with distinct comparables sources and rubric weights. Domain weights apply only to the merged composite; per-track scores publish separately.

Domain	What it measures	Weight
Single-asset valuation	Fair-market price for one patent given its claim set, prosecution history, family extent, and remaining term.	35%
Portfolio valuation	Aggregate value of a portfolio with consideration for clustering, overlap, geographic coverage, and assertion history.	30%
License-rate estimation	Reasonable royalty rate for a defined product/use given comparable license rates and Georgia-Pacific factors.	25%
Disclosure integrity	Mandatory disclosure of comparables used, weighting applied, sources cited. Failure to disclose triggers floor.	10%

§ 3Difficulty Tiers

Five tiers calibrated to valuation contexts a licensing executive or M&A diligence team encounters.

Tier	Context	What it tests	Human MAPE
Tier 1	Direct comparable	The asset has near-identical recent transactions in the same technology area; valuation is interpolation.	~15%
Tier 2	Adjacent comparable	Comparables exist in adjacent fields requiring documented adjustment factors.	~25%
Tier 3	Sparse comparables	Few public comparables; expert calibration drives the answer; Track B becomes load-bearing.	~40%
Tier 4	Litigation-affected	Asset has assertion history, PTAB challenge outcomes, or pending district-court damages exposure.	~55%
Tier 5	Standards / SEPs	Standard-essential patents with FRAND obligations, declared / undeclared status, ETSI / IEEE / 3GPP comparables.	~70%

Human-MAPE figures are calibration estimates from CBlindspot’s licensing-executive panel; final figures publish with v1.0. MAPE = mean absolute percentage error vs. final transaction price.

§ 4Four-Layer Scoring

ValueBench combines four scoring layers. Composite score equals the weighted sum, except where a hard floor applies (§ 7).

Layer 1

Comparables retrieval

Did the tool retrieve genuinely-relevant comparables from the transaction record? Scored by overlap with the panel-curated comparable set per case, weighted by transaction recency and technology proximity.

25%In progress

Layer 2

Track A — Transaction-record price

Ground-truth signal. For cases with public transaction prices (USPTO Patent Assignment recordings with consideration disclosed, public M&A purchase price allocations, court-ordered sales), the AI-estimated price is scored against the actual price. MAPE is the headline metric.

30%Planned

Layer 3

Track B — Licensing-executive panel

Expert signal. A panel of practicing licensing executives scores the AI’s valuation logic against the rubrics in § 5. Inter-rater reliability publishes per release.

35%Recruiting

Layer 4

Disclosure integrity

Mandatory disclosure: every comparable used, the weight applied, the data source. Citations of nonexistent transactions, comparables, or licensing rates trigger a hard floor of 0.0. See § 7.

10% (or floor)Live

§ 5Evaluation Rubrics

Valuation Logic Rubric

Comparable selection — were the comparables genuinely-relevant and recent? (2.0× weight)
Adjustment factors — were industry-standard adjustments applied (size, time, technology proximity, geography)? (1.5×)
Georgia-Pacific awareness — for license-rate estimation, are the 15 GP factors addressed where relevant? (1.5×)
Sensitivity analysis — does the tool show how the answer changes with assumption shifts? (1.0×)
Defensibility — would this valuation survive a Daubert / SEC / fiduciary review? (2.0×)

Reporting Quality Rubric

Source disclosure — every comparable cited, source identified, version/date stamped. (2.0×)
Methodology transparency — weighting rationale articulated. (1.5×)
Uncertainty quantification — output includes a range, not just a point estimate. (1.5×)
Reproducibility — another analyst with the same inputs and method reaches a value within the disclosed range. (1.0×)

§ 6Reproducibility (Glass Box)

Per Charter § 8.3, every published ValueBench run is reproducible by third parties within a 5% scoring delta. Same five pillars as DraftBench:

Test set publication. Cases publish one release cycle after initial scoring. Held-out and confidential-comparable sets are documented; redaction protocols disclosed.
Rubric transparency. Both rubrics are public, versioned, and annotated with calibration examples.
Output availability. Vendor outputs (with consent) and panel scores publish under Apache-2.0. Both successful and failed valuations published.
Failure-mode analysis. Each scored run includes a failure-mode taxonomy (fabricated comparable, wrong industry adjustment, missing GP factor analysis, no uncertainty range, etc.).
Continuous reporting. Public leaderboard updates monthly. Methodology revisions go through Methodology Council review per Charter § 7.2.

§ 7Hallucination Floor (Therasense Standard)

Doctrinal basis

Therasense, Inc. v. Becton, Dickinson & Co., 649 F.3d 1276 (Fed. Cir. 2011) (en banc). For valuation contexts, ValueBench extends the Therasense floor beyond cited prior art to cited comparables and cited license rates: a fabricated comparable transaction or fabricated license-rate citation is materially indistinguishable from a fabricated case citation.

Any cited transaction (USPTO Patent Assignment record, M&A filing, court-ordered sale, public license disclosure) must verify against the authoritative source. Any cited license rate must trace to a public license, an analyst report, or a documented disclosure. Verification failure triggers a hard floor: composite equals 0.0.

This is not a deduction. A valuation built on one fabricated comparable does not earn 80%; it earns 0%. The integrity-floor mechanic is doctrinally analogous to inequitable conduct: catastrophic misrepresentation is not averaged into a passing grade.

§ 8Data Repository

The ValueBench repository will publish at github.com/cblindspot/valuebench at provisional accreditation. Planned contents:

METHODOLOGY.md — this document, versioned.
cases/public/ — cases with public transaction prices (Track A scoring); each case carries the asset description, comparables set, and final price.
cases/held-out/ — the held-out set documented but not published until next release cycle.
rubrics/valuation_logic.json, rubrics/reporting_quality.json — full rubric specifications.
panel_scores/ — per-release panel ratings (anonymized executive ID, rubric breakdown, free-text rationale).
vendor_outputs/ — vendor-submitted outputs with their consent. Apache-2.0.
harness/ — the evaluation harness (transaction-record matching, comparable-verification client, panel-scoring intake).
leaderboard.md — auto-generated rolling results.

§ 9Governance

ValueBench is owned by CBlindspot. Per Charter § 2 principle 2, the Open IP Council does not own this benchmark; it accredits it (provisional, pending public comment).

Provisional accreditation status: ValueBench enters the registry as provisional per Charter § 6. After the 60-day public comment closes (June 2026), the Methodology Council reviews and votes on full accreditation by 2/3 supermajority.

No single AI vendor, law firm, or licensing platform holds majority influence over ValueBench’s test set, rubrics, or panel composition. CBlindspot publishes its conflict-of-interest disclosures per Charter § 8.2.

§ 10Get Involved

Vendors — submit your tool for provisional evaluation. Outputs publish under Apache-2.0 with your written consent.
Licensing executives — apply to the Track B panel. 5+ years of in-house licensing or law-firm transactions experience required.
Buyers / fund managers / auditors — cite ValueBench accreditation in your valuation procurement / vendor-selection criteria.
Academics — the Methodology Council welcomes peer-review papers analyzing valuation calibration data and rubric construct validity.
Public comment — methodology objections via the OIPC public-comment process. Subject line should include VALUEBENCH-METHODOLOGY-v0.1.

Submit a tool → File a comment →

OIPC-VALUEBENCH-METHODOLOGY-v0.1 · Last reviewed 2026-04-29 · Provisional Charter v0.1 · All benchmarks