Measured system κ
The engine vs the held-out blind-labeled set
pending
no standards have completed the weekly κ series
Accuracy
Three numbers describe how ContentRX scores against a fixed bar. They’re kept separate on purpose. A single “accuracy score” would hide the self-drift ceiling. The reporting format (measured numbers, 95% intervals, pending cells named honestly) follows the Model Cards pattern from Mitchell et al., 2019.
Measured system κ
The engine vs the held-out blind-labeled set
pending
no standards have completed the weekly κ series
Measured self-drift κ
Same panel relabeled blind, weeks apart
0.575
95% CI [0.313, 0.836] · n = 80
Design target κ
A design assumption, not a measurement
0.900
Design assumption · stated separately from measurements
ContentRX evaluates against 49 standards. As a standard collects enough labelled cases it moves up the ladder: every verdict reviewed, then sampled review, then no per-verdict review. The per-standard numbers stay internal; this page reports the aggregate.
Every verdict reviewedSampled reviewNo per-verdict review0 of 49 standards have enough data for a measured κ.
New entries land roughly weekly. Each one covers the most recent κ movement, drift signals, override counts, and active refinement candidates. The format is templated on purpose. Week-to-week consistency is what makes drift in the writing detectable.