Research Matters 33

Research Matters 33 - Foreword

Oates, T. (2022). Foreword. Research Matters: A Cambridge University Press & Assessment publication, 33, 4.

In this edition of Research Matters we are seeing significant refinement in both application and thinking associated with Comparative Judgement. Genuinely ground-breaking, the wide-ranging studies and projects examine its limits and processes as well as its relation to existing assessment approaches. There’s one aspect of this edition which I really commend – it not only explores the characteristics of Comparative Judgement through carefully designed empirical work, it increases our understanding of the processes of human judgement within it.

Download
Research Matters 33 - Editorial - the CJ landscape

Bramley, T. (2022). Editorial. Research Matters: A Cambridge University Press & Assessment publication, 33, 5.

Eleven years ago in Research Matters, Bramley & Oates (2011) described the “state of play” regarding research into Comparative Judgement (CJ). At the time it was still being referred to as a “new” method, at least in terms of its application in educational assessment. (The technique of paired comparisons in psychology has been around since the 19th century!) It is still not a mainstream technique, but much more is now known about its strengths and weaknesses. In this editorial we give an overview of what we see as the current CJ landscape and some of the key research questions and practical issues.

Download
A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries

Benton, T., Gill. T., Hughes, S., & Leech. T. (2022). A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries. Research Matters: A Cambridge University Press & Assessment publication, 33, 10–30.

The rationale for the use of comparative judgement (CJ) to help set grade boundaries is to provide a way of using expert judgement to identify and uphold certain minimum standards of performance rather than relying purely on statistical approaches such as comparable outcomes. This article summarises the results of recent trials of using CJ for this purpose in terms of how much difference it might have made to the positions of grade boundaries, the reported precision of estimates and the amount of time that was required from expert judges.

The results show that estimated grade boundaries from a CJ approach tend to be fairly close to those that were set (using other forms of evidence) in practice. However, occasionally, CJ results displayed small but significant differences with existing boundary locations. This implies that adopting a CJ approach to awarding would have a noticeable impact on awarding decisions but not such a large one as to be implausible. This article also demonstrates that implementing CJ using simplified methods (described by Benton, Cunningham et al, 2020) achieves the same precision as alternative CJ approaches, but in less time. On average, each CJ exercise required roughly 30 judge-hours across all judges.

Download
How do judges in Comparative Judgement exercises make their judgements?

Leech, T. & Chambers, L. (2022). How do judges in Comparative Judgement exercises make their judgements? Research Matters: A Cambridge University Press & Assessment publication, 33, 31–47.

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method’s reliability and technical quality, are “what processes do judges use to make their decisions” and “what features do they focus on when making their decisions?” This article discusses both, in the context of CJ for standard maintaining, by reporting the results of both a study into the processes used by judges when making CJ judgements, and the outcomes of surveys of judges who have used CJ. In the first instance, using insights from observations of judges and their being asked to think aloud while they judged, we highlight the variety of processes used when making their decisions, including comparative reference, re-marking and question-by question evaluation. We then develop a four dimension model to explore what impacts what judges attend to, and explore through survey responses the distinctive ways in which the structure of the question paper, different elements of candidate responses, judges’ own preferences and the CJ task itself affect decision-making. We conclude by discussing, in the light of these factors, whether the judgements made in CJ (or in the judgemental element of current standard maintaining procedures) are meaningfully holistic, and whether judges can properly take into account differences in difficulty between different papers.

Download
Judges’ views on pairwise Comparative Judgement and Rank Ordering as alternatives to analytical essay marking

Walland, E. (2022). Judges’ views on pairwise Comparative Judgement and Rank Ordering as alternatives to analytical essay marking. Research Matters: A Cambridge University Press & Assessment publication, 33, 48–67.

In this article, I report on examiners’ views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of ten essays using RO, I collected data on their experiences and views of the methods through interviews and questionnaires. I analysed the data using thematic content analysis.

The findings highlight that, if the methods were to be used as alternatives to marking, examiners and other stakeholders would need reassurance that the methods are fair, valid and reliable. Examiners would also need more training and support to help them to judge holistically. The lack of detail about how judgements are made using these methods is a concern worth following up and addressing before implementation.

Download
The concurrent validity of comparative judgement outcomes compared with marks

Gill, T. (2022). The concurrent validity of comparative judgement outcomes compared with marks. Research Matters: A Cambridge University Press & Assessment publication, 33, 68–79.

In Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards.

Results from previous CJ studies have demonstrated that the method appears to be valid and reliable in many contexts. However, it is not entirely clear whether CJ works as well as it does because of the physical and judgemental processes involved (i.e., placing two scripts next to each other and deciding which is better based on an intuitive, holistic, and relative judgement), or because CJ exercises capture a lot of individual paired comparison decisions quickly. This article adds to the research on this question by re-analysing data from previous CJ studies and comparing the concurrent validity of the outcomes of individual CJ paired comparisons with the concurrent validity of outcomes based on the original marks given to scripts.

The results show that for 16 out of the 20 data sets analysed, mark-based decisions had higher concurrent validity than CJ-based decisions. Two possible reasons for this finding are: CJ decisions reward different skills to marks; or individual CJ decisions are of lower quality than individual decisions based on marks. Either way, the implication is that the CJ method works because many individual paired comparison decisions are captured quickly, rather than because of the physical and psychological processes involved in making holistic judgements.

Download
How are standard-maintaining activities based on Comparative Judgement affected by mismarking in the script evidence?

Williamson, J. (2022). How are standard-maintaining activities based on Comparative Judgement affected by mismarking in the script evidence? Research Matters: A Cambridge University Press & Assessment publication, 33, 80–99.

An important application of Comparative Judgement (CJ) methods is to assist in the maintenance of standards from one series to another in high stakes qualifications, by informing decisions about where to place grade boundaries or cut scores. This article explores the extent to which standard-maintaining activities based on Comparative Judgement would be robust to mismarking in the sample of scripts used for the comparison exercise. While extreme marking errors are unlikely, we know that mismarking can occur in live assessments, and quality of marking can vary. This research investigates how this could affect the outcomes of CJ-based methods, and therefore contributes to better understanding of the risks associated with using CJ-based methods for standard maintaining. The article focuses on the ‘simplified pairs’ method (Benton et al., 2020), an example of the ‘universal method’ discussed by Benton (this issue).

Download
Moderation of non-exam assessments: is Comparative Judgement a practical alternative?

Vidal Rodeiro, C. L. & Chambers, L.(2022). Moderation of non-exam assessments: is Comparative Judgement a practical alternative? Research Matters: A Cambridge University Press & Assessment publication, 33, 100-119.

Many high-stakes qualifications include non-exam assessments that are marked by teachers. Awarding bodies then apply a moderation process to bring the marking of these assessments to an agreed standard. Comparative Judgement (CJ) is a technique where two (or more) pieces of work are compared at a time, allowing an overall rank order of work to be generated.

This study explored the practical feasibility of using CJ for moderation via an experimental moderation task requiring judgements of pairs of authentic portfolios of work. This included aspects such as whether moderators can view and navigate the portfolios sufficiently to enable them to make the comparative judgements, on what basis they make their decisions, whether moderators can be confident making CJ judgements on large pieces of candidate work (e.g., portfolios), and the time taken to moderate.

Download
Research News

Bowett, L. (2022). Research News. Research Matters: A Cambridge University Press & Assessment publication, 33, 123-125.

A summary of recent conferences, reports, blogs and research articles published since the last issue of Research Matters.

Download

A101: Introducing the Principles of Assessment

CPD accredited online courses

First cohort receives advanced award from the Assessment Network

Become a Member and join the debate

Our publications

Contents

Contents

Research Matters 33 - Foreword

Research Matters 33 - Editorial - the CJ landscape

A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries

How do judges in Comparative Judgement exercises make their judgements?

Judges’ views on pairwise Comparative Judgement and Rank Ordering as alternatives to analytical essay marking

The concurrent validity of comparative judgement outcomes compared with marks

How are standard-maintaining activities based on Comparative Judgement affected by mismarking in the script evidence?

Moderation of non-exam assessments: is Comparative Judgement a practical alternative?

Research News