The Unjournal · Evaluation package

Observational Price Variation in Scanner Data Does Not Reproduce Experimental Price Elasticities

Robert Bray, Robert Evan Sanders & Ioannis Stamatopoulos (2026)

Roemheld: The paper documents a striking divergence, but the experimental estimates face a plausibility challenge — they imply profit opportunities far exceeding observed industry margins. Contribution is valuable, but headline conclusion requires stronger foundations.

Anonymous evaluator: A nice paper with a rich novel experimental dataset. The DID shift is not automatically "observational bias" — it may reflect functional-form restrictiveness rather than bias per se. Worth quantifying how much is explained by curvature.

Roemheld 70overall · tier 3 / 3.5
Anonymous 60overall · tier 4 / 5
Verbatim quote — evaluator or author's own words AI-drafted summary — checked and edited by D. Reinstein

The paper

The research

This paper tests whether standard observational demand estimation with rich scanner data recovers the price elasticities revealed by a randomized pricing experiment — and finds they diverge substantially. AI-drafted summary

The canonical wording for this overview is the evaluation manager's summary on PubPub: unjournal.pubpub.org/pub/evalsumbraybray/. The text below was drafted by AI and edited by D. Reinstein.

Using data from a large grocery retailer's pricing experiment, Bray, Sanders & Stamatopoulos (Robert Bray, Robert Evan Sanders, and Ioannis Stamatopoulos) ask whether standard observational demand-estimation methods — applied to rich scanner data — recover the "true" price elasticities revealed by a randomized price experiment. They find the answer is no. The observational methods shift own-price elasticities substantially relative to the experimental benchmark:

  • A difference-in-differences analysis finds observational own-price elasticity estimates are roughly 2 units more negative (more elastic) than the experimental benchmark — the observational estimates shift by about −2 relative to the experimental own-price elasticity of approximately −0.34.
  • The headline experimental own-price elasticity is approximately −0.34 — substantially less elastic than observational methods suggest.
  • The gap persists across nine product categories and at the individual-product level.
  • The pattern holds under multiple robustness specifications addressing salience, stockouts, and aggregation concerns.
  • The paper title was softened from "cannot reproduce" to "does not reproduce" to better reflect the scope of the evidence.

Why the gap? Leading explanations examined

The paper examines and partially rules out several candidate explanations: temporary-price-reduction salience (93% of observational variation is TPRs), experimental compliance, stockout dynamics, sample selection, basket-level price perception, and functional-form restrictiveness. The full analysis is in the paper; the evaluators focus on which explanations remain live.

What's in the paper

Actual section structure from the openly-hosted working-paper version. The evaluated 2026 revision renumbers some sections — the evaluation's references to Sections IV.E/IV.F/IV.I correspond to later Findings subsections.

  1. 1. Introduction
  2. 2. Related Literature
  3. 3. Data and Experiment — 3.1 Data; 3.2 Price Experiments
  4. 4. Empirical Strategy
  5. 5. Findings — 5.1 Primary Results (≈ −1.97 observational vs −0.34 experimental); 5.2 Zero Sales; 5.3 TPR (promotion) Effects; 5.4 Price-Process Differences; 5.5 Demand Response Times; 5.6 Observational Instruments
  6. 6. Conclusion
  7. Appendices — A. Additional Exhibits; B. Data Appendix

Canonical record

SSRN 4899765 revised 5 Feb 2026 · opens in new tab

Published evaluation summary: unjournal.pubpub.org/pub/evalsumbraybray/ · DOI: 10.21428/d28e8e57.6a89cf15

Implications

Why it matters AI-drafted summary

Policy and cause-area relevance

The paper's findings bear directly on whether scanner-data demand estimates can be trusted for policy analysis — including animal-welfare-motivated policies that hinge on meat / plant-based substitution elasticities. If observational methods systematically differ from true experimental estimates, cost-effectiveness conclusions built on scanner-data elasticities could be unreliable.

The evaluators agree on the significance of this contribution while disagreeing on its current strength. Roemheld presses on the plausibility of the experimental benchmark itself. The anonymous evaluator questions the "bias" framing, suggesting the gap may reflect functional-form restrictiveness rather than observational bias.

If the experimental elasticity estimate is correct, the finding has broad implications for empirical industrial-organization, consumer-demand modeling, and any policy that uses scanner-data elasticities as inputs.

The evaluations

What the evaluators said

Two evaluators with different focal concerns — their views are presented side by side here.

Summary (verbatim)

"This paper compares observational and experimental price elasticity estimates from a large grocery retailer. While the paper documents a striking divergence between the two approaches, I argue that the experimental estimates themselves face a plausibility challenge: they imply profit opportunities far exceeding observed industry margins. I offer five diagnostic suggestions that may help resolve this tension, focusing on experimental compliance, promotional salience, and stockout dynamics. The paper's contribution is valuable, but its headline conclusion requires stronger foundations."

Main claim as identified

Main research claim (as read by E1)
"Standard observational approaches to estimating price elasticity of demand fail to approximate true (experimental) elasticity, even with rich scanner data."
E1's central diagnostic concern
The experimental own-price elasticity of approximately −0.34 implies the retailer prices in the inelastic region, which would imply profit opportunities far above observed grocery margins (Kroger-like: 20–25% gross, 2–3% net). This "margin puzzle" is the organizing concern for E1's five suggestions.

Six diagnostic concerns

Each concern is paired with the authors' reply as a margin note inside the full evaluation below (open "Full evaluation"); the author-response overview indexes them.

  • E1-01 The margin puzzle. An elasticity near −0.34 implies the retailer prices in the inelastic region, inconsistent with profit maximization under standard markup formulas and observed margins.
  • E1-02 Experimental compliance and price-data integrity. Do transacted prices accurately reflect the experimental assignments?
  • E1-03 Promotional salience and consumer attention. 93% of observational price changes are temporary price reductions — a very different mechanism from the experiment's price changes.
  • E1-04 Out-of-stock dynamics. Stockouts could censor observed demand response, biasing elasticity estimates in either direction.
  • E1-05 Aggregation and sample selection. Test products are higher-selling and non-randomly selected; unweighted averages may not generalize.
  • E1-06 Basket-level perception and competitive dynamics. Mean-zero perturbations to individual products may leave perceived basket prices approximately unchanged, muting response.
See the published version on PubPub Roemheld's evaluation as published on The Unjournal's platform.
See the published version

Roemheld noted his per-criterion sub-ratings were "rather positive for all, but somewhat less positive for Claims, strength, and characterization of evidence."

Summary / abstract (verbatim)

"The paper uses data from a rich price experiment and frames it as a validation of observational demand methods. The headline finding — a DID shift of own-price elasticities toward zero — is not automatically 'observational bias,' because the experiment changes the price support: elasticities during treatment are evaluated at different price levels and thus different points on the demand curve. The most natural takeaway is that constant-elasticity log-linear demand is too restrictive here. A useful extension would quantify how much of the shift is explained by curvature using flexible demand estimates."

Main claim as identified

Main research claim (as read by E2)
Observational methods do not reproduce the experimental elasticity even with rich scanner data; E2's emphasis is that this need not be "bias" but may reflect functional-form restrictiveness.

Key public points

  • Overall positive: "A nice paper that brings in a rich novel experimental dataset to the demand estimation literature… experimental variation across many categories provides a valuable benchmark."
  • Framing concern (E2-02): A DID effect on estimated elasticities is not automatically "observational bias" — the experiment changes the price support, so elasticities are evaluated at different points on the demand curve.
  • Clarification needed (E2-01): What is the main estimator — OLS log–log or 2SLS? Page 9 mentions four models; it was unclear to E2 which is used in Figure 2.
  • Takeaway (E2-03): Log-linear (constant-elasticity) demand is likely too restrictive. The central fact is more about functional-form / model misspecification than "bias" per se.
  • Suggested extension: Quantify how much of the shift is explained by the price-level channel using flexible demand estimates.

Published evaluation: unjournal.pubpub.org/pub/evalsumbraybray/

Ratings

Ratings comparison

Overall assessment with 90% credible intervals, 0–100 scale; plus journal-tier assessments. Only these summary ratings are available in this brief.

Note: Full per-criterion ratings (methods, logic, claims, etc.) are on the PubPub evaluation: unjournal.pubpub.org/pub/evalsumbraybray/. Roemheld noted per-criterion sub-ratings were "rather positive for all, but somewhat less positive for Claims, strength, and characterization of evidence."

Whisker = 90% credible interval · marker = point estimate
Roemheld · journal tier Should-be: 3 · Predicted: 3.5
Anonymous · journal tier Should-be: 4 · Predicted: 5

Tier legend (should-be / predicted placement): 0 little value · 1 somewhat valuable · 2 decent field journal · 3 strong field journal · 4 top field journal · 5 A-journal / top journal.

Ratings — overall assessment with 90% credible intervals
Metric Roemheld (E1) E1 90% CI Anonymous (E2) E2 90% CI
Overall assessment (0–100)7040–906055–70
Journal tier — should be (0–5)34
Journal tier — predicted (0–5)3.55

Per-criterion sub-ratings available at the full PubPub evaluation. The two evaluators show a notable gap on overall assessment (70 vs 60) with overlapping but distinct CIs — Roemheld's CI is wider (40–90) reflecting greater uncertainty.

Author response

Authors' response — overview

Robert Bray, Robert Evan Sanders, and Ioannis Stamatopoulos provided a detailed point-by-point response to both evaluators. The manuscript was also substantially revised after the evaluated version.

General statement — Robert Bray, Robert Evan Sanders & Ioannis Stamatopoulos

"Thank you all for your interest in our work… the manuscript has been substantially revised since the version originally evaluated."

The authors' point-by-point replies are now shown inline, beside the passage each one answers, inside the two full evaluations above. On a wide screen they appear as margin notes (green-ruled "Authors' reply"; olive-ruled "Manager note") to the right of the relevant text; on a narrow screen they fold in directly under the passage. Use the labels below to jump to a passage and its reply.

Roemheld (E1): E1-01 margin puzzle · E1-02 compliance · E1-03 promotional salience · E1-04 stockouts · E1-05 aggregation · E1-06 basket perception · E1-07 substitution

Anonymous (E2): E2-01 estimator · E2-02 DID / price support · E2-03 functional form · E2-04/05/06 pivotal question

Two evaluation-manager (higher-level) notes are anchored beside the E2-02 and E2-03 passages. Concerns are drawn from the published evaluations; replies from the published author response. Full text on PubPub.

Process & status

Transparency & process

Package status
2026 evaluation package. The manuscript was substantially revised after the evaluated version.
Revised manuscript on SSRN · 5 Feb 2026
Evaluated version
Roemheld confirmed he evaluated the 5 Feb 2026 SSRN version (abstract_id=4899765).
Why this paper
Relevance to scanner-data demand estimation and to animal-welfare / food-policy interventions that rely on observational elasticity estimates.
Evaluator identities
E1 Lars Roemheld is identified (signed). E2 is anonymous by choice. Standard Unjournal process.
Conflicts of interest
Standard Unjournal disclosure applies. Evaluators were selected for complementary expertise: E1 brings applied industry pricing experience; E2 brings econometric demand-estimation expertise.
Process and guidelines
unjournal.org — full evaluation guidelines, author response process, and disclosure policy.
Full evaluation on PubPub
unjournal.pubpub.org/pub/evalsumbraybray/

Back to top