The Challenges and Flaws of Interpreting Results with Statistical Significance

Moving beyond Tradition

The researchers and policymakers need to be made aware of the challenges and flaws of null hypothesis significance testing. When writing and interpreting the results of research articles and policy papers, one should be able to critique the findings and suggestions.

This article is inspired by the talk given by T S Krishnan (IIM Nagpur) at NIT-Calicut and from the discussions of the Academy of Management Research Methods Symposium, in Chicago, IL, in August 2018. The authors thank T S Krishnan (IIM Nagpur) for reading this manuscript and providing valuable suggestions and Althaf S (NIT-C) for providing valuable guidance and direction while writing this article. The authors are also thankful for the reviewer’s suggestions and comments.

Poppers falsification principle argues that theories should be continually resistant to falsification. Thus, in research, we aim to refute the theory rather than prove it. To refute the theory, we refute the null hypothesis of no effect. Rejecting or accepting a null hypothesis is the backbone of a research paper. Normally, the null hypothesis states that there is no significant correlation and influence among the variables. And, every researcher aims to reject this hypothesis to show that there is a significant correlation and influence between the variables. Consequently, studies are set out to reject the null hypothesis through statistical significance tests. However, recently there have been growing efforts to retire the statistical significance tests by reducing the p-value threshold (Amrhein and Greenland 2018; Amrhein et al 2019). This debate has shed suspicion on public policy papers due to their concerns on reliability and replicability. Therefore, in this article, we argue that while interpreting statistical significance, researchers and public policy experts should approach it with caution.

In the world of research and publication, journals are usually found biased against the null hypothesis. This bias is inherent in the falsification principle, and it results in suppressing papers that fail to reject a null hypothesis of no difference. Like the effect hypothesis, the no-effect hypothesis could also help predict or validate a theory. For example, if a researcher finds that there is no presence of oxygen then there is no fire. This finding can help substantiate that oxygen presence can cause a fire (Cortina and Folger 1998). Moreover, due to this null hypothesis bias, researchers who are on the verge of publishing their work may find their results disappointing and may be forced to make changes. So, they get the null hypothesis rejected even though the results of their actual paper were different. This is increasingly evident from the fact that 70% of the articles in the top journals have not disclosed enough data to run independent tests to check the replicability of the findings. And, out of the studies for which data is available, one-third of them reported that statistically significant results were statistically non-significant in the retest (Bergh et al 2017). In the long run, this trend has serious consequences. Science becomes an unethical and cynical career advancement exercise, and better theories get obscured.

Published On : 20th Jan, 2024

