Job Market Paper
The Need for Equivalence Testing in Economics. Institute for Replication Discussion Paper Series No. 125, 2024. Under submission.
[ Abstract | Draft | Online Appendix | Stata Command | R Package | Shiny App | 30-Minute Presentation | Interview: Economisch Statistische Berichten (in Dutch) ]
Equivalence testing can provide statistically significant evidence that economic relationships are practically negligible. I demonstrate its necessity in a large-scale replication of estimates defending 135 null claims made in 81 articles from top economics journals. 36-63% of estimates defending the average null claim fail lenient equivalence tests. Obtaining equivalence testing failure rates that surveyed researchers deem acceptable requires arguing that nearly 75% of published estimates in economics are practically equal to zero, implying that Type II error rates are unacceptably high throughout economics. I provide economists with guidelines and commands in Stata/R for conducting credible equivalence testing and practical significance testing in future research.
Published and Forthcoming Articles
US States That Mandated COVID-19 Vaccination See Higher, Not Lower, Take-Up of COVID-19 Boosters and Flu Vaccines. Proceedings of the National Academy of Sciences (121)41, e2403758121, 2024.
[ Abstract | Article (Open Access) | Draft | Data & Code, Published Replication | Reply | Response to Reply | Data & Code, Response to Reply | Twitter/X Thread ]
Rains & Richards (2024, Proceedings of the National Academy of Sciences) find that compared to US states that instituted bans on COVID-19 vaccination requirements, states that imposed COVID-19 vaccination mandates exhibit lower adult and child uptake of flu vaccines, and lower uptake of COVID-19 boosters. These differences are generally interpreted causally. However, further inspection reveals that these results are driven by the inclusion of a single bad control variable. When removed, the data instead shows that states which mandated COVID-19 vaccination experience higher COVID-19 booster and flu vaccine takeup than states that banned COVID-19 vaccination requirements.
Is There a Foreign Language Effect on Workplace Bribery Susceptibility? Evidence from a Randomized Controlled Vignette Experiment (with Paul Stroet, Arjen van Witteloostuijn, and Kristina S. Weißmüller). Journal of Business Ethics, 2024.
[ Abstract | Article (Open Access) | Draft | Code ]
Theory and evidence from the behavioral science literature suggest that the widespread and rising use of lingua francas in the workplace may impact the ethical decision-making of individuals who must use foreign languages at work. We test the impact of foreign language usage on individual susceptibility to bribery in workplace settings using a vignette-based randomized controlled trial in a Dutch student sample. Results suggest that there is not even a small foreign language effect on workplace bribery susceptibility. We combine traditional null hypothesis significance testing with equivalence testing methods novel to the business ethics literature that can provide statistically significant evidence of bounded or null relationships between variables. These tests suggest that the foreign language effect on workplace bribery susceptibility is bounded below even small effect sizes. Post hoc analyses provide evidence suggesting fruitful further routes of experimental research into bribery.
Working Papers
Identifying the Impact of Hypothetical Incentives on Experimental Outcomes and Treatment Effects. Tinbergen Institute Discussion Paper Series No. 24/070-I, 2024. Under invited submission, Experimental Economics.
[ Abstract | Code & Data Retrieval Instructions | Draft | Slides ]
Recent studies showing that some outcome variables do not statistically significantly differ between real-stakes and hypothetical-stakes conditions have raised methodological challenges to experimental economics' disciplinary norm that experimental choices should be incentivized with real stakes. I show that the hypothetical bias measures estimated in these studies do not econometrically identify the hypothetical biases that matter in most modern experiments. Specifically, traditional hypothetical bias measures are fully informative in 'elicitation experiments' where the researcher is uninterested in treatment effects (TEs). However, in 'intervention experiments' where TEs are of interest, traditional hypothetical bias measures are uninformative; real stakes matter if and only if TEs differ between stakes conditions. I demonstrate that traditional hypothetical bias measures are often misleading estimates of hypothetical bias for intervention experiments, both econometrically and through re-analyses of three recent hypothetical bias experiments. The fact that a given experimental outcome does not statistically significantly differ on average between stakes conditions does not imply that all TEs on that outcome are unaffected by hypothetical stakes. Therefore, the recent hypothetical bias literature does not justify abandoning real stakes in most modern experiments. Maintaining norms that favor completely or probabilistically providing real stakes for experimental choices is useful for ensuring externally valid TEs in experimental economics.
Imputations, Inverse Hyperbolic Sines, and Impossible Values. 2024. Under invited submission, Nature Human Behaviour.
[ Abstract | Data & Code | Draft ]
Wolfowicz et al. (2023, Nature Human Behaviour) find that more arrests and convictions for terrorism offenses decrease terrorism, more charges increase terrorism, and longer sentences do not deter terrorism in 28 European Union member states from 2006-2021. I assess the computational reproducibility of their study and find many data irregularities. The article's primary dependent variable - purportedly an inverse hyperbolic sine transformation of terrorist attack rates - takes on 292 different values when attack rates equal zero, and negatively correlates with attack rates. Many variables exhibit impossible values or undisclosed imputations, often masking a lack of reporting in the article's main data sources. I estimate that the authors have access to 57% fewer observations than claimed. Reproduction attempts produce estimates at least 77.7% smaller than the published estimates. Models reflecting the true degree of missing data produce estimates that are not statistically significantly different from zero for any independent variable of interest.
Three-Sided Testing to Establish Practical Significance: A Tutorial (with Peder Isager). PsyArXiv and Tinbergen Institute Discussion Paper Series 2024-077/III, 2024. Under submission.
[ Abstract | Draft | Stata Command | R Package | Shiny App | Slides | Twitter/X Thread ]
Researchers may want to know whether an observed statistical relationship is either meaningfully negative, meaningfully positive, or small enough to be considered practically equivalent to zero. Such a question can not be addressed with standard null hypothesis significance testing, nor with standard equivalence testing. Three-sided testing (TST) is a procedure to address such questions, by simultaneously testing whether an estimated relationship is significantly below, within, or above predetermined smallest effect sizes of interest. TST is a natural extension of the standard two one-sided tests (TOST) procedure for equivalence testing. TST offers a more comprehensive decision framework than TOST with no penalty to error rates or statistical power. In this paper, we give a non-technical introduction to TST, provide commands for conducting TST in R, Jamovi, and Stata, and provide a Shiny app for easy implementation. Whenever a meaningful smallest effect size of interest can be specified, TST should be combined with null hypothesis significance testing as the default frequentist testing procedure.
Revisiting the Impacts of Anti-Discrimination Employment Protections on American Businesses. 2024. Under submission, Management Science.
[ Abstract | Code & Data Retrieval Instructions | Draft ]
Greene & Shenoy (2022, Management Science) - henceforth GS22 - find that the staggered adoption of U.S. state-level protections against racial discrimination in employment decreased both the profitability and leverage of affected businesses. However, these results arise from two-way fixed effects (TWFE) difference-in-differences models. Such models are now known to return inaccurate estimates of average treatment effects on the treated (ATTs) when treatment assignment is staggered, as some firm-year ATTs can enter the TWFE estimator with negative weight. I find that 21-36% of firm-year ATTs in GS22's sample enter the TWFE estimator with negative weight. I then replicate GS22's results using recently-developed difference-in-differences estimators that return valid ATT estimates under staggered adoption. None of these new ATT estimates are statistically significantly different from zero.
The Problems with Poor Proxies: Does Innovation Mitigate Agricultural Damage from Climate Change? Institute for Replication Discussion Paper Series No. 158, 2024.
[ Abstract | Draft | Data & Code | Authors’ Response | Twitter/X Thread ]
Moscona & Sastry (2023, Quarterly Journal of Economics) - henceforth MS23 - find that cropland values are significantly less damaged by extreme heat exposure (EHE) when crops are more exposed to technological innovation. Re-analyzing MS23's replication data, I document extensive evidence that this finding is not robust, and that the mitigatory effects of innovation on climate change damage are negligibly small. MS23's 'innovation exposure' variable does not measure innovation, instead proxying innovation using a measure of crops' national heat exposure. This proxy moderates EHE impacts for reasons unrelated to innovation. I show that the proxy is practically identical to local EHE, meaning that MS23's models examining interaction effects between their proxy and local EHE effectively interact local EHE with itself. I demonstrate that MS23's findings on 'innovation exposure' simply reflect nonlinear impacts of local EHE on agricultural land value, and uncover robustness issues for other key findings. I then construct direct measures of innovation exposure from MS23's crop variety and patenting data. Replacing MS23's proxy with these direct innovation measures decreases MS23's moderating effect estimates by at least 99.2% in standardized units; none of these new estimates are statistically significantly different from zero. Similar results arise from an instrumental variables strategy that instruments my direct innovation measures with MS23's heat proxy. These results cast doubt on the general capacity for market innovations to mitigate agricultural damage from climate change.
Manipulation Tests in Regression Discontinuity Design: The Need for Equivalence Testing. Institute for Replication Discussion Paper Series No. 136, 2024. Under submission.
[ Abstract | Draft | R Package | Stata Command | Python Package (created by Leo Stimpfle) | Slides | Twitter/X Thread ]
Researchers utilizing regression discontinuity design (RDD) commonly test for running variable (RV) manipulation around a cutoff, but incorrectly assert that insignificant manipulation test statistics are evidence of negligible manipulation. I introduce simple frequentist equivalence testing procedures that can provide statistically significant evidence that RV manipulation around a cutoff is practically equivalent to zero. I then demonstrate the necessity of these procedures, leveraging replication data from 36 RDD publications to conduct 45 equivalence-based RV manipulation tests. Over 44% of RV density discontinuities at the cutoff can not be significantly bounded beneath a 50% upward jump. Bounding equivalence-based manipulation test failure rates beneath 5% requires arguing that a 350% upward density jump is practically equivalent to zero. Meta-analytic estimates reveal that average RV manipulation around the cutoff is equivalent to a 26% upward density jump. These results imply that many published RDD estimates may be confounded by discontinuities in potential outcomes due to RV manipulation that remains undetectable by existing tests. I provide research guidelines and commands in Stata and R to help researchers conduct more credible equivalence-based manipulation testing in future RDD research.
Selected Works in Progress
The Impacts of Replications in Economics (with Bob Reed and Tom Coupé).
A Comment on “Resisting Social Pressure in the Household Using Mobile Money: Experimental Evidence on Microenterprise Investment in Uganda” (with Lenka Fiala, Essi Kujansuu, and David Valenta).
Fine Enough or Don’t Fine At All? Experimental Evidence on Optimal Fine Levels for Natural Resource Conservation.
The Nuclear Option: Cross-Country Evidence on the Impacts of Nuclear Power Supply.