Academic research normally goes through a peer review stage before it appears in a journal or book. This usually involves an editor selecting two or more scholars from the field that have relevant expertise and asking them to assess the submission. These reviewers then make comments about various aspects of the submission as well as giving an overall recommendation, such as: accept, ask for minor revisions and then accept, ask for major revisions and then reevaluate, or reject. The purposes of this exercise include filtering out flawed studies or papers with little value and helping the authors to improve their work by correcting errors or suggesting additional perspectives to consider.
In an ideal world, every paper that passes peer review is error-free, clear and makes a valuable contribution to academic knowledge. In practice, however, there is no absolute truth and so reviewers must make judgements about the extent to which each work is high enough quality to be published. In theory, reviewers should always agree in their evaluations because they are all experts but in practice disagreement is common.
This is a problem for the safety of the academic record because any disagreement between experts suggests that (a) some substandard research gets published because even though most experts would consider it to be low quality, by chance the two or three reviewers selected liked it, and (b) future authors that evaluate the published research and potentially rely on it for their studies, like reviewers, may not be able to effectively evaluate its quality. Thus, reviewer disagreement points to the flawed nature of the academic publishing system, undermining the validity of academic research.
There are some flaws in the above argument, one of which was addressed in our study. This flaw is that academic research often draws on a range of methodological and theoretical expertise. Thus, reviewers may be chosen for non-overlapping knowledge and may make their recommendations based on different aspects of submissions. This should be particularly the case for complex multi-method research and least likely for narrower studies. In our paper (which itself had to go through peer review), we started from the premise that theoretical physics was a field in which a high degree of reviewer agreement could be expected because it is mathematical and theoretical and therefore does not require specialist knowledge about equipment, processing methods or practical application contexts. Instead, it seems that any theoretical physicist should have a reasonable chance of fully comprehending journal articles on the topic, and would therefore be able to make an overall judgement about submissions. We therefore assessed agreement between reviewers for theoretical physics articles.
We investigated deeper than the overall agreement rate and into different aspects of research quality. The three core dimensions of research quality are usually agreed to be rigour, originality, and significance (to scholarship and/or society) and so we assessed the extent to which reviewers of theoretical physics papers gave the same scores for each of these dimensions. The data came from the SciPost Physics online journal, chosen for being one of the few journals that publishes rigour, originality, and significance scores from reviewers. The results showed that reviewers agreed 40% to 48% of the time for each of the three facets. When they disagreed, it was usually by one point (39% to 46% of all judgements), so larger disagreements were rare. Nevertheless, the results still represent a majority disagreement between reviewers in this nearly perfect case, showing that disagreement between reviewers is a fundamental part of science rather than a by-product of different types of expertise (Figure 1).
- Prof Mike Thelwall
Professor of Data Science
Janusz A. Hołyst, Center of Physics in Economics and Social Sciences, Warsaw University of Technology
Thelwall, M. & Holyst, J. (in press). Can journal reviewers dependably assess rigour, significance, and originality in theoretical papers? Evidence from physics. Research Evaluation. https://doi.org/10.1093/reseval/rvad018
Figure 1. Model of key factors influencing peer review judgements (Figure 10 in: Thelwall & Holyst, in press).
Comments