Social-behavioral findings can be highly replicable, six-year study by four labs suggests
Roughly two decades ago, a community-wide reckoning emerged concerning the credibility of published literature in the social-behavioral sciences, especially psychology. Several large scale studies attempted to reproduce previously published findings to no avail or to a much lesser magnitude, sending the credibility of the findings — and future studies in social-behavioral sciences — into question.
A handful of top experts in the field, however, set out to show that when best practices are employed, high replicability is possible. Over six years, researchers at labs from UC Santa Barbara, UC Berkeley, Stanford University and the University of Virginia discovered and replicated 16 novel findings with ostensibly gold standard best practices, including pre-registration, large sample sizes and replication fidelity. Their findings, published in Nature Human Behaviour, indeed suggest that with best practices, high replicability is achievable.
“It’s an existence proof that we can set out to discover new findings and replicate them at a very high level,” said UC Santa Barbara Distinguished Professor Jonathan Schooler, director of UCSB’s META Lab and the Center for Mindfulness and Human Potential, and senior author of the paper. “The major finding is that when you follow current best practices in conducting and replicating online social-behavioral studies, you can accomplish high and generally stable replication rates.”
Their study’s replication findings were 97% the size of the original findings on average. By comparison, prior replication projects observed replication findings that were roughly 50%.
The paper’s principal investigators were John Protzko of UCSB’s META Lab and Central Connecticut State University (CCSU), Jon Krosnick of Stanford’s Political Psychology Research Group, Leif Nelson at UC Berkeley’s Haas School of Business and Brian Nosek, who is affiliated with the University of Virginia and is the executive director of the standalone Center for Open Science.
“There have been a lot of concerns over the past few years about the replicability of many sciences, but psychology was among the first fields to start systematically investigating the issue,” said lead author Protzko, a research associate to Schooler’s lab, where he was a postdoctoral scholar during the study. He is now an assistant professor of psychological science at CCSU. “The question was whether past replication failures and declining effect sizes are inherently built into the assorted scientific domains that have observed them. For example, some have speculated that it is an inherent aspect of the scientific enterprise that newly discovered findings can become less replicable or smaller over time.”
The group decided to perform new studies using emerging best practices in open science — and then to replicate them with an innovative design in which the researchers committed to replicating the initial confirmation studies regardless of outcome. Over the course of six years, research teams at each lab developed studies which were then replicated by all of the other labs.
In total, the coalition discovered 16 new phenomena and replicated each of them 4 times involving 120,000 participants. “If you use best practices of large samples, pre-registration, open materials in the discovery of new science, and you run replications with as best fidelity to the original process as you can, you end up with a very highly replicable science,” Protzko said of the findings.
One key innovation the study offered was that all of the participating labs agreed to replicate the initial confirmation studies regardless of their outcome. This removed the scientific community’s customary bias of only publishing and replicating positive outcomes, which may have contributed to inflated initial assessments of effect sizes in the past. Furthermore, this approach enabled the researchers to observe several cases for which study designs that failed to produce significant findings in the original confirmation later attained reliable effects when replicated at other labs.
Across the board, the project revealed extremely high replicability rates of their social-behavioral findings, and no statistically significant evidence of decline over repeated replications. Given the sample sizes and effect sizes, the observed replicability rate of 86%, based on statistical significance, could not have been any higher, the researchers pointed out.
To test the novelty of their discoveries, they ran independent tests on people’s predictions regarding the direction of the new findings and their likelihood of replicability. Several follow-up surveys in which naïve participants evaluated descriptions of both the new studies and those associated with previous replication projects, found no differences in their respective predictability. Thus, the replication success of these studies was not due to them discovering obvious results that would necessarily be expected to replicate. Indeed, many of the newly discovered findings have already been independently published in high quality journals.
“It would not be particularly interesting to discover that it is easy to replicate completely obvious findings,” Schooler said. “But our studies were comparable in their surprise factor to studies that have been difficult to replicate in the past. Untrained judges who were given summaries of the two conditions in each of our studies and a comparable set of two-condition studies from a prior replication effort found it similarly difficult to predict the direction of our findings relative to the earlier ones.”
Because each research lab developed its own studies, they came from a variety of social, behavioral and psychological fields such as marketing, political psychology, prejudice, and decision-making. They all involved human subjects and adhered to certain constraints, such as not using deception. “We really built into the process that the individual labs would act independently,” Protzko said. “They would go about their sort of normal topics they were interested in and how they would run their studies.”
Collectively, their meta-scientific investigation provides evidence that low replicability and declining effects are not inevitable. Rigor enhancing practices can lead to very high replication rates, but exactly identifying which practices work best will take further study. This study’s “kitchen sink” approach — using multiple rigor-enhancing practices at once — didn’t isolate any individual practice’s effect.
Additional investigators on the study are Jordan Axt (Department of Psychology, McGill University of Montreal, Canada); Matt Berent (Matt Berent Consulting); Nicholas Buttrick (Department of Psychology, University of Wisconsin-Madison), Matthew DeBell (Institute for Research in Social Sciences, Stanford University), Charles R. Ebersole (Department of Psychology, University of Virginia), Sebastian Lundmark (The SOM Institute, University of Gothenburg, Sweden); Bo MacInnis (Department of Communication, Stanford University), Michael O’Donnell, (McDonough School of Business, Georgetown University); Hannah Perfecto (Olin School of Business, Washington University in St. Louis); James E. Pustejovsky (Educational Psychology Department, University of Wisconsin-Madison); Scott S. Roeder (Darla Moore School of Business, University of South Carolina); and Jan Walleczek (Phenoscience Laboratories, Berlin, Germany).