Layer 1 — The paper

Outline & argument

Different people use survey scales differently — one person's "7/10" is another's "5/10." This scale-use heterogeneity has hindered economists' adoption of subjective-wellbeing data for decades. The authors propose a framework to model and correct it.

  • The problem: scale-use heterogeneity in self-reported wellbeingWhy cross-person comparisons of survey scores can mislead.
  • The framework: a shifter and a stretcherWhere you center your scale; how spread out your scale is.
  • Estimation from calibration questions (CQs)A small number of extra questions identify each person's parameters.
  • Applications: the correction can change resultsE.g. wellbeing comparisons across groups shift after adjustment.
  • Empirical proof of concept + forthcoming representative dataUnderstanding America Study sample in the revised version.
Layer 2 — Implications

Why it matters

Subjective-wellbeing scores feed the WELLBY measure used in global-priorities cost-effectiveness analysis. If scale-use heterogeneity biases those scores, it biases the cost-effectiveness rankings built on them.

This package links directly to The Unjournal's Pivotal Questions project on the WELLBY measure — a deliberately chosen, practice-relevant connection.

Identified claims — left pane

Select a claim to anchor the evaluators' commentary

The paper's three main claims are listed below. Click any claim to jump the right pane to what Caspar Kaiser and Alberto Prati each said about it.

Layer 3 — Evaluation — right pane

Per-claim evaluator commentary

This pane shows what each evaluator (Caspar Kaiser and Alberto Prati) said about whichever paper claim is selected on the left. Ratings and full evaluations follow below.

Select a claim on the left to scroll to what Kaiser and Prati said about it. Currently showing: all claims.

Full evaluations

Alberto Prati — individual evaluation

"This is an extraordinary paper. It is the kind of methodological research one wants to see more often."

Overall evaluation. It approaches a fundamental issue in wellbeing measurement, and does so constructively, by suggesting and testing a potential solution. The contribution is strong in both its theoretical and empirical parts. The authors offer a deep reflection on the problem of scale-use heterogeneity, connect it with the social-science literature, give a theoretically informed account of how to think about it, and suggest a sound solution for estimation. The empirical effort is impressive too: the working-paper analysis provides a useful proof of concept, supplemented by additional data from a large representative sample (Understanding America Study) in the forthcoming version.

The model is very well thought out. The use of a shifter and a stretcher parameter makes a lot of sense. Some choices might go unnoticed by an unfamiliar reader, but recentring the shifter, conditioning results on a question's "height", and the distinction between "dimensional scale use" and "general scale use" are actually smart innovations.

More comments about the limits. The paper has limits not because of any fault in methods or reasoning, but because a single study cannot solve all problems of response-scale heterogeneity. This is proper to a research agenda, and the current paper already provides a substantial leap forward.

1. Adding calibration questions is costly. The evidence is based on a large number of calibrating questions. It is not entirely clear how well the correction performs when only two or three CQs are used (the realistic scenario). Even two CQs can be a substantial burden in large surveys given tight space constraints, and could be cognitively demanding. I suspect this is one crucial reason anchoring vignettes have not been implemented at scale in 20 years.

Caspar Kaiser — individual evaluation

"This is a major methodological innovation in how we can adjust for differences in scale-use."

A major methodological innovation. The framework is elegant and the estimation strategy is sound. The empirical component would especially benefit from more diverse and reliable samples, and from direct comparisons against existing scale-correction methods so readers can judge incremental value. Logic and communication could be tightened in places — rated lower here than the other dimensions.

Layer 4 — Author response

The authors reply

The length and thoroughness of the evaluations clearly demonstrate the significant time and intellectual effort the evaluators invested. We are grateful for their insightful and constructive comments. Since the revised paper is still forthcoming, we do not provide a detailed point-by-point response at this stage; we find the suggestions very valuable and will carefully consider them — particularly — as we finalize the revision. We welcome this public scientific discourse.

Layer 5 — Process & followups

Transparency & what happens next

Status
Interim evaluation. The authors made clear this is an interim version; updates are forthcoming. We evaluated it anyway because of its prominence and relevance. A follow-up evaluation is planned once the revised paper — with the Understanding America Study data — is released. Prati's report explicitly considered updates presented in recent seminars.
Why this paper
Prominence, relevance to ongoing practice, and a direct link to our Pivotal Questions project on the WELLBY measure.
Evaluators
Selected for complementary methodological and applied expertise. Both were aware of the interim status.
COI
Standard Unjournal disclosure applies.
Manager note
Evaluation manager, 24 Nov 2025.