Jackie Greatorex

Jackie Greatorex

Since joining Cambridge University Press and Assessment, I’ve researched a range of assessment topics including construct validation, comparability, reliability, grading, standardisation of assessors’ judgements (in academic and vocational settings), grade descriptors, examiners’ cognition, context in examination questions, and the cognitive demand of examination questions. I have also studied wider education and curriculum themes, such as, teaching approaches in A Level Chemistry and how ‘Application of Number’ teaching was organised in schools and colleges.

Prior to joining Cambridge University Press and Assessment, I researched the reliability of medics’ judgements when checking mammograms.

I am a Psychologist and an Associate Fellow of the British Psychological Society. I hold a MEd from University of Bristol and an MA from Cambridge University. For my PhD, which I obtained at the University of Derby, I investigated learning in healthcare degrees. The research drew from psychology, andragogy and curriculum theory.

My team’s research focuses on education and curriculum. The scope of our work is wide-ranging and open to include all ages, subjects (academic or vocational) and jurisdictions. This builds on the curriculum theory I studied during my PhD, and gives me the opportunity to research a variety of key education and curriculum matters.

Publications

2024

Differential effects of subject-based and integrated curriculum approaches on students' learning outcomes: A review of reviews.

Kreijkes, P., & Greatorex, J. (2024). Differential effects of subject-based and integrated curriculum approaches on students' learning outcomes: A review of reviews. Review of Education, 12, e3465.

Comparing curricula from different regions: a common practice revamped by using MAXQDA.

Greatorex, J., & Ireland, J. (2024, 28 Feb–1 Mar). Comparing curricula from different regions: a common practice revamped by using MAXQDA [Poster presentation]. MAXQDA International Conference, Berlin, Germany. https://www.maxqda.com/wp/wp-content/uploads/sites/2/JGJI_Indigenous-Knowledge.pdf

2023

The appliance of science: exploring the use of context in reformed GCSE science examinations
Crisp, V. & Greatorex, J. (2023). The appliance of science: exploring the use of context in reformed GCSE science examinations. Assessment in Education: Principles, Policy & Practice

2022

An analysis of cultural representations of India and the UK in English subject curricula
Cambridge Partnership for Education (2022). An analysis of cultural representations of India and the UK in English subject curricula. British Council India

2021

What is competence? A shared interpretation of competence to support teaching, learning and assessment
Vitello, S., Greatorex, J., & Shaw, S. 2021. What is competence? A shared interpretation of competence to support teaching, learning and assessment. Cambridge University Press & Assessment Research Report. Cambridge, UK: Cambridge University Press & Assessment
Early policy response to COVID-19 in education—A comparative case study of the UK countries

Mouthaan, M., Johnson, M., Greatorex, J., Coleman, V., and Fitzsimons, S. (2021). Early policy response to COVID-19 in education—A comparative case study of the UK countries. Research Matters: A Cambridge Assessment publication, 31, 51-67.

Inspired by the work of David Raffe and his co-authors who set out the positive benefits gained from comparing the policies of “the UK home nations” in an article published in 1999, researchers in the Education and Curriculum Team launched a project in early 2020 that we called Curriculum Watch. The aim of this project was to collate a literature and documents database of education and curriculum policies, research and analyses from across the four countries of the United Kingdom (UK).

In this article, we draw on our literature database to make sense of the rapid changes in education policy that occurred in the early stages of the COVID-19 pandemic in the four UK nations of England, Scotland, Wales and Northern Ireland. We analyse some of the key areas of UK policy formation and content (in relation to curriculum, pedagogy and assessment) that we observed during the first six months of the unfolding pandemic. In addition, we reiterate the clear benefits of using comparative research methods in the UK context: our research findings support the idea that closeness of national contexts offers the opportunity for evidence exchange and policy learning in education.

Research Matters 31: Spring 2021
  • Foreword Tim Oates, CBE
  • Editorial Tom Bramley
  • Attitudes to fair assessment in the light of COVID-19 Stuart Shaw, Isabel Nisbet
  • On using generosity to combat unreliability Tom Benton
  • A guide to what happened with Vocational and Technical Qualifications in summer 2020 Sarah Mattey
  • Early policy response to COVID-19 in education—A comparative case study of the UK countries Melissa Mouthaan, Martin Johnson, Jackie Greatorex, Tori Coleman, Sinead Fitzsimons
  • Generation Covid and the impact of lockdown Gill Elliott
  • Disruption to school examinations in our past Gillian Cooke, Gill Elliott
  • Research News Anouk Peigne

2020

A way of using taxonomies to demonstrate that applied qualifications and curricula cover multiple domains of knowledge

Suto, I., Greatorex, J., Vitello, S., and Child, S. (2020). A way of using taxonomies to demonstrate that applied qualifications and curricula cover multiple domains of knowledge. Research Matters: A Cambridge Assessment publication, 30, 26-34.

Educational taxonomies are classification schemes which provide the terminology that educationalists need to describe and work with different areas of knowledge. It is good practice to use taxonomies to formulate and review curricula, learning objectives, and associated assessments. Demonstrating sufficient coverage of each of an adequate range of knowledge domains is critical for authenticity, for assessment reliability, and for transparency surrounding what students are learning.

In this study we explored whether any educational taxonomies that were designed for general educational contexts (sometimes called ‘academic’ contexts) could be utilised in applied educational contexts (often called ‘vocationally-related’ in England) . To do this, we identified nine published taxonomies with sufficient potential, and selected and combined the most appropriate. This process led us to develop a new model of demand.  We then applied the selected taxonomies experimentally to existing curricula in a range of applied subjects which are taught at secondary and tertiary level in England. We also used the selected taxonomies to develop a tool for writing educational objectives. This article ends with suggestions for applying the selected taxonomies in other areas of assessment.

Context matters—Adaptation guidance for developing a local curriculum from an international curriculum framework

Fitzsimons, S., Coleman, V., Greatorex, J., Salem, H., and Johnson, M. (2020). Context matters—Adaptation guidance for developing a local curriculum from an international curriculum framework. Research Matters: A Cambridge Assessment publication, 30, 12-18.

The Learning Passport (LP) is a collaborative project between the University of Cambridge, UNICEF and Microsoft, which aims to support the UNICEF goal of providing quality education provision for children and youth whose education has been disrupted by crisis or disaster. A core component of this project is a curriculum framework for Mathematics, Science and Literacy which supports educators working in emergency contexts. This framework provides a broad outline of the essential content progressions that should be incorporated into a curriculum to support quality learning in each subject area, and is intended to act as a blueprint for localised curriculum development across a variety of contexts. To support educators in the development of this localised curriculum an LP Adaptation Guidance document was also created. This document provides guidance on several factors that local curriculum developers should consider before using the LP Curriculum Framework for their own curriculum development process. This article discusses how key areas within the LP Adaptation Guidance have broader relevance beyond education in emergencies, highlighting that the challenges that exist within some of the most deprived educational contexts have applicability in all contexts.

Research Matters 30: Autumn 2020
  • Foreword Tim Oates, CBE
  • Editorial Tom Bramley
  • A New Cambridge Assessment Archive Collection Exploring Cambridge English Exams in Germany and England in JPLO Gillian Cooke
  • Perspectives on curriculum design: comparing the spiral and the network models Jo Ireland, Melissa Mouthaan
  • Context matters—Adaptation guidance for developing a local curriculum from an international curriculum framework Sinead Fitszimons, Victoria Coleman, Jackie Greatorex, Hiba Salem, Martin Johnson
  • Setting and reviewing questions on-screen: issues and challenges Victoria Crisp, Stuart Shaw
  • A way of using taxonomies to demonstrate that applied qualifcations and curricula cover multiple domains of knowledge Irenka Suto, Jackie Greatorex, Sylvia Vitello, Simon Child
  • Research News Anouk Peigne
The Learning Passport: Curriculum Framework (Maths, Science, Literacy).
Cambridge Assessment. (2020). The Learning Passport: Curriculum Framework (Maths, Science, Literacy). Cambridge, UK: Cambridge Assessment.
The Learning Passport Research and Recommendations Report.
Cambridge University Press & Cambridge Assessment. (2019). The Learning Passport Research and Recommendations Report: Summary of Findings. Cambridge, UK: Cambridge University Press & Cambridge Assessment. 

View the summary of the Learning Passport Research and Recommendations Report.

2019

An international review of ways of involving employers in judging learners' performance in assessments

Greatorex, J., and Darlington, E. (2019). An international review of ways of involving employers in judging learners' performance in assessments. Presented at the Ofqual Educational Assessment Seminar, University of Warwick, UK, 4th April 2019.

Towards a method for comparing curricula
Greatorex, J., Rushton, N., Coleman, T., Darlington, E. and Elliott, G. (2019). Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.

2018

A review of instruments for assessing complex vocational competence

Greatorex, J., Johnson, M. and Coleman, V. (2017). A review of instruments for assessing complex vocational competence. Research Matters: A Cambridge Assessment publication, 23, 35-42.

The aim of the research was to explore the measurement qualities of checklists and Global Rating Scales [GRS] in the context of assessing complex competence. Firstly, we reviewed the literature about the affordances of human judgement and the mechanical combination of human judgements. Secondly, we reviewed examples of checklists and GRS which are used to assess complex competence in highly regarded professions. These examples served to contextualise and elucidate assessment matters. Thirdly, we compiled research evidence from the outcomes of systematic reviews which compared advantages and disadvantages of checklists and GRS. Together the evidence provides a nuanced and firm basis for conclusions. Overall, literature shows that mechanical combination can outperform the human integration of evidence when assessing complex competence, and that therefore a good use of human judgements is in making decisions about individual traits, which are then mechanically combined. The weight of evidence suggests that GRS generally achieve better reliability and validity than checklists, but that a high quality checklist is better than a poor quality GRS. The review is a reminder that including assessors in designing assessment instruments processes can helps to maximise manageability.

2016

Extending educational taxonomies from general to applied education: Can they be used to write and review assessment criteria?
Greatorex, J. and Suto, I. (2016). Paper presented at the 8th Biennial Conference of the European Association for Research in Learning and Instruction (EARLI) SIG 1 - Assessment and Evaluation, Munich, Germany, 24-26 August 2016
Employers' views on assessment design in vocational qualifications: a preliminary study
Vitello, S., Carroll, P., Greatorex, J. and Ireland, J. (2016). Paper presented at the European Conference on Educational Research (ECER), Dublin, Ireland, 23-26 August 2016.
Analysing the Cognitive Demand of Reading, Writing and Listening Tests
Greatorex, J. and Dhawan, V. (2016). Paper presented at the International Education conference, Clute Institute, Venice, Italy, 5-9 June 2016

2015

Piloting a method for comparing examination question paper demands
Chambers, L., Greatorex, J., Constantinou, F. and Ireland, J. (2015). Paper presented at the AEA-Europe annual conference, Glasgow, Scotland, 4-7 November 2015.
Piloting a method for comparing examination question paper demands
Greatorex, J., Chambers, L., Constantinou, F. and Ireland, J. (2015).  Paper presented at the British Educational Research Association (BERA) conference, Belfast, UK, 14-17 September 2015.
Do experts’ views of specification demands correspond with established educational taxonomies?
Greatorex, J., Rushton, N., Mehta, S. and Grayson, R. (2015). Do experts’ views of specification demands correspond with established educational taxonomies? Online Educational Research Journal. (Advance online publication).
Linking instructional verbs from assessment criteria to mode of assessment
Greatorex, J., Ireland, J., Carroll, P. and Vitello. S. (2015) Paper presented at the Journal for Vocational Educational and Training (JVET) conference, Oxford, 3-5 July 2015.

2014

Context in Mathematics questions

Greatorex, J. (2014). Context in Mathematics questions. Research Matters: A Cambridge Assessment publication, 17, 18-23.

For at least two decades educationalists have debated whether Mathematics examination questions should be set in context. The aim of this article is to revisit the debate to answer the following questions: 1. What are the advantages and disadvantages of examining Mathematics in context? 2. What are the features of a high quality context? Initially several taxonomies (categories or classification systems) of context are reviewed and the research methods for evaluating the effects of context are considered. Subsequently, the advantages and disadvantages of using context in Mathematics examination questions are explored, focusing on research about public examinations in secondary school Mathematics in England. The literature is used to make recommendations about context in Mathematics questions.

2013

How can major research findings about returns to qualifications illuminate the comparability of qualifications?

Greatorex, J. (2013) Paper presented at Journal of Vocational Educational and Training (JVET) conference, Oxford, 5-7 July 2013 and British Educational Research Association (BERA) conference, Brighton, 3-5 September 2013

Do the questions from A and AS Level Economics exam papers elicit responses that reflect the intended construct?
Greatorex, J., Shaw, S., Hodson, P. and Ireland, J. (2013) Poster presented at British Education Research Association (BERA) conference, Brighton, 3-5 September 2013
Using scales of cognitive demand in a validation study of Cambridge International A and AS level Economics

Greatorex, J., Shaw, S., Hodson, P., Ireland, J. and Werno, M. (2013). Paper presented at British Education Studies Association conference, Swansea, 27-28 June 2013

Using scales of cognitive demand in a validation study of Cambridge International A and AS level Economics

Greatorex, J., Shaw, S., Hodson, P. and Ireland, J. (2013). Using scales of cognitive demand in a validation study of Cambridge International A and AS level Economics. Research Matters: A Cambridge Assessment publication, 15, 29-37.

The research aims to map the cognitive demand of examination questions in A and AS Level economics. To this end we used the CRAS (complexity, resources, abstractness, strategy) framework, an established way of analysing the cognitive demand of examination questions. Six subject experts applied the CRAS framework to selected question papers which included multiple choice, essay and data response items. That is each subject expert rated the level of cognitive demand of each question twice; without reference to the mark scheme and once with reference to the mark scheme. Ratings without the mark scheme indicate how demanding the questions appear.  Ratings with the mark scheme indicate the cognitive demands rewarded by the mark scheme. Analysis showed that the demands elicited by the question were similar to those rewarded by the mark scheme, which is evidence of validity. The findings are used to explore using CRAS with different types of items (multiple choice, essay and data response).

2012

A method for comparing the demands of specifications
Greatorex, J. and Mehta, S. (2012) Paper presented at the British Educational Research Association Conference, Manchester, 4-6 September 2012 and the European Conference on Educational Research, Cádiz, 18-21 September 2012
The validity of teacher assessed Independent Research Reports contributing to Cambridge Pre-U GPR
Greatorex, J. and Shaw, S. (2012) Paper presented at British Education Research Association conference, Manchester, 4-6 September 2012
The validity of teacher assessed Independent Research Reports contributing to Cambridge Pre-U Global Perspectives and Research

Greatorex, J. and Shaw, S. (2012). The validity of teacher assessed Independent Research Reports contributing to Cambridge Pre-U Global Perspectives and Research. Research Matters: A Cambridge Assessment publication, 14, 38-41.

This research considered the validity of tutor assessed, pre-university independent research reports. Evidence of construct relevance in tutors’ interpretations of the levels awarded to the candidates’ research process was investigated. This included designing, planning, managing and conducting their own research project using techniques and methods appropriate to the subject discipline. The research was conducted in the context of the Cambridge International Pre-U Global Perspectives and Independent Research qualification (the GPR), a pre-university qualification for 16-19 year olds which is designed to equip students with the skills required to make a success of their university studies.  Tutors’ justifications for the levels they gave candidates were considered.  In the first of two studies (Study 1), tutor justifications were qualitatively analysed for specific tutor behaviours that might highlight tutors interpreting levels in a construct irrelevant way.  In the second study (Study 2), external moderators (EMs) rated the justifications according to the extent to which they reflected the intended constructs.  Study 1 showed little evidence of construct irrelevance and Study 2 provided strong evidence of construct relevance in tutors’ interpretation of the levels they awarded candidates for the research process. 

Piloting a method for comparing the demand of vocational qualifications with general qualifications

Greatorex, J. and Shiell, H. (2012). Piloting a method for comparing the demand of vocational qualifications with general qualifications. Research Matters: A Cambridge Assessment publication, 14, 29-38.

Frequently, researchers are tasked with comparing the demand of vocational and general qualifications, and methods of comparison often rely on human judgement.  Therefore, the research aims to develop an instrument to compare vocational and general qualifications, pilot the instrument and explore how experts judge demand.  Reading a range of OCR (Oxford Cambridge and Royal Society of Arts Examinations) level 2 specifications illustrated that they included knowledge, skills and understanding from five domains; the affective, cognitive, interpersonal, metacognitive, and psychomotor domains. Therefore, these domains were included in the instrument. Four cognate units were included in the study. Four experts participated, each with familiarity with at least one unit. Each expert read pairs of unit specifications and judged which was more demanding in each domain (affective, cognitive, interpersonal, metacognitive and psychomotor).  Subsequently, they completed a questionnaire about their experience. The results are presented.  It was found that the demands instrument was suitable for comparing the demand of cognate units from vocational and general qualifications.

2011

Comparing different types of qualifications: an alternative comparator

Greatorex, J. (2011). Comparing different types of qualifications: an alternative comparator. Research Matters: A Cambridge Assessment publication, Special Issue 2, 34-41. 

Returns to qualifications is a statistical measure of how much more is earned on average by people with a particular qualification compared to people with similar demographic characteristics who do not have the qualification. Awarding bodies and the national regulator do not generally use this research method in comparability studies, although they are prominent in government reviews of qualifications. 

This article considers what returns to qualifications comparability research can offer awarding bodies. This comparator enables researchers to make comparisons which cannot be achieved by other methods, for instance, comparisons between different types of qualifications, occupations, sectors and progression routes. It has the advantage that it is more independent than customary comparators used in many comparability studies.

As with all research approaches, returns to qualifications has strengths and weaknesses, but provides some robust comparability evidence. The strongest comparability evidence is when there is a clear pattern in the results of several studies using different established research methods and independent data sets. Therefore results from returns to qualifications research combined with results from the customary comparators would provide a strong research evidence base.

Comparing specifications in a diverse qualifications system: instrument development
Greatorex, J., Rushton, N., Mehta, S. and Hopkin, R. (2010). Paper presented at the British Educational Research Association annual conference, University of London Institute of Education, September 2011.
Comparing different types of qualifications (e.g. vocational versus academic)
Greatorex, J. (2011) British Educational Research Association, London
Comparing specifications from diverse qualifications: instrument development
Greatorex, J., Rushton, N., Mehta, S. and Hopkin, R. (2011).  Paper presented at the Journal of Vocational Education and Training International conference, Oxford, July 2011.
Developing a research tool for comparing qualifications

Greatorex, J., Mehta, S., Rushton, N., Hopkin, R. and Shiell, H. (2011). Developing a research tool for comparing qualifications. Research Matters: A Cambridge Assessment publication, 12, 33-42.

Comparability studies about qualification standards generally use demand or candidates’ performance as comparators. However, these can be unrepresentative for vocational and new qualifications. Consequently, other comparators need to be used. This article details the process of devising and piloting a research instrument to compare the features of cognate units from diverse qualifications and subjects.

First, knowledge was elicited from twelve experts through Kelly’s repertory grid interviews where they were asked to compare different types of qualifications. This data was analysed thematically.  Four features and several sub-features were identified. These features were used to categorise the interview data and develop the research instrument. A pilot of the instrument indicated that salient features varied between units. Therefore, the instrument is suitable for use in future comparability studies about features. However, conventions still need to be agreed for how to analyse the data that is collected using the instrument.

Comparing the demand of syllabus content in the context of vocational qualifications: literature, theory and method

Novakovic, N. and Greatorex, J. (2011). Comparing the demand of syllabus content in the context of vocational qualifications: literature, theory and method.  Research Matters: A Cambridge Assessment publication, 11, 25-32.

Our literature review considers the methods used in studies comparing the demands of vocational syllabus content in England. Generally, categories of demands are either derived from subject experts’ views or devised by researchers. Subsequently, subject experts rate each syllabus on each demand category and comparisons can be made. However, problems with the methods include: i) Some studies over-focus on the cognitive domain rather than the affective, interpersonal and psychomotor domains; ii) Experts vary in their interpretations of rating scales. Therefore, we suggest creating a framework of demands which includes all four domains, based on a variety of subject experts’ views of demands. The subject experts might rank each syllabus on each type of demand, thus avoiding the problem(s) of rating scales, and facilitating comparisons between syllabuses.

2010

Is CRAS a suitable tool for comparing specification demands from vocational qualifications?

Greatorex, J. and Rushton, N. (2010). Is CRAS a suitable tool for comparing specification demands from vocational qualifications? Research Matters: A Cambridge Assessment publication, 10, 40-44.

The aim of the research was to ascertain whether a framework of cognitive demands, known as CRAS, is a suitable tool for comparing the demands of vocational qualifications.  CRAS was developed for use with academic examinations and may not tap into the variety of demands which vocational qualifications place on candidates.  Data were taken from a series of comparability studies by awarding bodies and the national regulator.  The data were the frameworks (often questionnaires) used to compare qualifications in these studies.  All frameworks were mapped to CRAS.  It was found that most aspects of the various frameworks mapped to an aspect of CRAS.  However, there were demands which did not map to CRAS; these were mostly affective and interpersonal demands, such as working in a team.  Affective and interpersonal domains are significant in vocational qualifications; therefore, using only CRAS to compare vocational qualifications is likely to omit key demands from the comparison.

How do examiners make judgements about standards? Some insights from a qualitative analysis

Greatorex, J. (2010). How do examiners make judgements about standards? Some insights from a qualitative analysis. Research Matters: A Cambridge Assessment publication, 9, 24-32.

There is a good deal of research about how judgements are made in awarding when A level and GCSE grade boundaries are chosen. There is less research about how judgements are made in Thurstone paired comparisons and rank ordering (popular methods in comparability studies to compare grading standards). Therefore, the research question for the present study is ‘how do Principal Examiners (PEs) make judgements about standards in awarding, Thurstone paired comparisons and rank ordering?’ The present article draws from a wider project in which Principal Examiners thought aloud whilst making judgements about the quality of candidates’ work and grading standards in awarding, Thurstone paired comparisons and rank ordering situations analogous to how these methods are practised. For the present analysis a coding frame was developed to qualitatively analyse the think aloud data. The coding frame constituted codes grounded in the think aloud data and grade descriptors from the qualification specification. It was found that overall the Principal Examiners attended to valid factors such as where marks were gained, responses to key questions and characteristics of candidates’ work that were in the grade descriptors.When the importance of each factor was considered there were some similarities and some differences between the methods. Implications and recommendations are discussed.

2009

How are archive scripts used in judgements about maintaining grading standards?
Greatorex, J.  (2009) British Educational Research Association (BERA) Annual Conference
Thinking about making the right mark: Using cognitive strategy research to explore examiner training

Suto, I., Greatorex, J. and Nadas, R. (2009). Thinking about making the right mark: Using cognitive strategy research to explore examiner training. Research Matters: A Cambridge Assessment publication, 8, 23-32.

In this article, we draw together research on examiner training and on the nature of the judgements entailed in the marking process. We report new analyses of data from two recent empirical studies, Greatorex and Bell (2008) and Suto and Nadas (2008a), exploring possible relationships between the efficacy of training and the complexity of the cognitive marking strategies apparently needed to mark the examination questions under consideration. In the first study reported in this article, we considered the benefits of three different training procedures for experienced examiners marking AS-level biology questions. In the second study reported here, we explored the effects of a single training procedure on experienced and inexperienced (graduate) examiners marking GCSE mathematics and physics questions. In both studies, it was found that: (i) marking accuracy was better after training than beforehand; and (ii) the effect of training on change in marking accuracy varied across all individual questions. Our hypothesis that training would be more beneficial for apparently more complex cognitive marking strategy questions than for apparently simple cognitive marking strategy questions was upheld for both subjects in Study 2, but not in Study 1.

How effective is fast and automated feedback to examiners in tackling the size of marking errors?

Sykes, E., Novakovic, N., Greatorex, J., Bell, J., Nadas, R. and Gill, T. (2009). How effective is fast and automated feedback to examiners in tackling the size of marking errors? Research Matters: A Cambridge Assessment publication, 8, 8-15.

Reliability is important in national assessment systems. Therefore, there is a good deal of research about examiners’ marking reliability. However, some questions remain unanswered due to the changing context of e-marking, particularly the opportunity for fast and automated feedback to examiners on their marking. Some of these questions are:

•    will iterative feedback result in greater marking accuracy than only one feedback session?
•    will encouraging examiners to be consistent (rather than more accurate) result in greater marking accuracy?
•    will encouraging examiners to be more accurate (rather than more consistent) result in greater marking accuracy?

Thirty three examiners were matched into four experimental groups based on severity of their marking. All examiners marked the same 100 candidate responses, in the same short time scale. Group 1 received one session of feedback about their accuracy. Group 2 received three iterative sessions of feedback about the accuracy of their marking. Group 3 received one session of feedback about their consistency. Group 4 received three iterative sessions of feedback about the consistency of their marking. Absolute differences between examiners’ marking and a reference mark were analysed using a general linear model. The results of the present analysis pointed towards the answer to all the research questions being “no”. The results presented in this article are not intended to be used to evaluate current marking practices. Rather the article is intended to contribute to answering the research questions, and developing an evidence base for the principles that should be used to design and improve marking practices.

Using ‘thinking aloud’ to investigate judgements about A-level standards: Does verbalising thoughts result in different decisions?

Greatorex, J. and Nádas, R. (2009). Using ‘thinking aloud’ to investigate judgements about A-level standards: Does verbalising thoughts result in different decisions? Research Matters: A Cambridge Assessment publication, 7, 8-16.

The ‘think aloud’ method entails people verbalising their thoughts while they do tasks, resulting in ‘verbal protocols’. The verbal protocols are analysed by researchers to identify the cognitive strategies and processes as well as the factors that affect decision making. Verbal protocols have been widely used to study decisions in educational assessment. The main methodological concern about using verbal protocols is whether thinking aloud compromises ecological validity (the authenticity of the thought processes) and thus the decision outcomes. Researchers have investigated to what extent verbalising affected the thinking processes under investigation in a variety of settings. Currently, the research literature generally is inconclusive; most results show just longer performance times and no alternative task outcome.

Previous research on marking collected decision outcomes from two conditions:
1. marking silently;
2. marking whilst thinking aloud.
The mark to re-mark differences were the same in the two conditions. However, it is important to confirm whether verbalising affects decisions about grading standards. Therefore, our main aim was to compare the outcomes of senior examiners making decisions about grading standards silently as opposed to whilst thinking aloud. Our article draws from a wider project taking three approaches to grading.

In experimental conditions, senior examiners made decisions about A-level grading standards for a science examination both silently and whilst thinking aloud. Three approaches to grading were used in the experiment. All scripts included in the research had achieved a grade A or B in the live examination. The decisions from the silent and verbalising conditions were statistically compared.

Our interim findings suggest that verbalising made little difference to the participants’ decisions; this is in line with previous research in other contexts. The findings reassure us that the verbal protocols are a useful method for research about decision making in both marking and grading.

2008

A Quantitative Analysis of Cognitive Strategy Usage in the Marking of Two GCSE Examinations
Suto, W. M. I. and Greatorex, J. (2008) Assessment in Education: Principles, Policy and Practice, 15, 1, 73-89
What makes AS marking reliable? An experiment with some stages from the standardisation process
Greatorex, J. and Bell J. F. (2008) Research Papers in Education, 23, 3, 333–355
What do GCSE examiners think of ‘thinking aloud’? Findings from an exploratory study
Greatorex, J. and Suto, W.M.I. (2008). What do GCSE examiners think of ‘thinking aloud’? Findings from an exploratory study. Educational Research, 40, 4, 319-331
What attracts judges’ attention? A comparison of three grading methods
Greatorex, J., Novakovic, N. & Suto, I. (2008) International Association for Educational Assessment (IAEA) Conference, Cambridge
Exploring the role of human judgement in examination marking: findings from some empirical studies
Greatorex, J., Suto, I. & Nadas, R. (2008) Association of Language Testers in Europe (ALTE), Cambridge
What goes through an examiner's mind? Using verbal protocols to gain insights into the GCSE marking process
Suto, W. M. I. and Greatorex, J. (2008) British Educational Research Journal, 34, 2, 213-233
Investigating the judgemental marking process: an overview of our recent research

Suto, I., Crisp, V. and Greatorex, J. (2008). Investigating the judgemental marking process: an overview of our recent research. Research Matters: A Cambridge Assessment publication, 5, 6-9.

This article gives an overview of a number of linked studies which explored the process of marking GCSE and A-level examination questions from a number of different angles. Key aims of these studies were to provide insights into how examiner training and marking accuracy could be improved, as well as reasoned justifications for how item types could be assigned to different groups of examiners in the future. The research studies combined several approaches, exploring both the information that people attend to when marking items and the sequences of mental operations involved. Examples include studies that used the think-aloud method to identify the cognitive marking strategies entailed in marking student responses, or to explore the broader socio-cognitive influences on the marking process. Other examples explored the relationship between cognitive marking strategy complexity and marking accuracy.

This article brings together the findings from these various related studies to summarise the influences and processes that have been identified as important to the marking process from the research conducted so far.

Judging Text Presented on Screen: implications for validity
Johnson, M. and Greatorex, J. (2008) E-Learning, 5, 1, 40-50

2007

What strategies do IGCSE examiners use to mark candidates' scripts?
Greatorex, J. (2007) International Schools Journal, 27, 1, 48-55
Assessors’ holistic judgements about borderline performances: some influencing factors
Johnson, M. and Greatorex, J. (2007) British Educational Research Association (BERA) Annual Conference
Exploring how the cognitive strategies used to mark examination questions relate to the efficacy of examiner training
Greatorex, J., Nádas, R., Suto, I. and Bell, J F. (2007) European Conference on Educational Research (ECER) Conference, Ghent, Belgium
Did examiners' marking strategies change as they marked more scripts?

Greatorex, J. (2007). Did examiners' marking strategies change as they marked more scripts? Research Matters: A Cambridge Assessment publication, 4, 6-13.

Prior research used cognitive psychological theories to predict that examiners might begin marking a question using particular cognitive strategies but later in the marking session they might use different cognitive strategies. Specifically, it was predicted that when examiners are familiar with the question paper, mark scheme and candidates’ responses they:

•    use less ‘evaluating’ and ‘scrutinising’
•    more ‘matching’

This research tests these predictions.  All Principal Examiners (n=5), Team Leaders (n=5) and Assistant Examiners (n=59) who marked in the winter 2005 session were sent a questionnaire. The questionnaire asked about different occasions in the marking session.  It was found that sometimes examiners’ marking strategies changed as the examiners marked more scripts.  When there were considerable changes in cognitive strategies these were mostly in the predicted direction.

2006

What do GCSE examiners think of 'thinking aloud'? Interesting findings from a preliminary study
Suto, I. and Greatorex, J. (2006) British Educational Research Association (BERA) Annual Conference
Do examiners’ approaches to marking change between when they first begin marking and when they have marked many scripts?
Greatorex, J. (2006) British Educational Research Association (BERA) Annual Conference
Judging learners' work on screen: Issues of validity

Greatorex, J. (2006). Judging learners' work on screen: Issues of validity. Research Matters: A Cambridge Assessment publication, 2, 14-17.

Current developments in Cambridge Assessment and elsewhere include assessors marking digital images of examination scripts on computer, rather than the original scripts on paper, and judges marking and moderating digitally produced coursework on computer, rather than on paper. One question such innovations raise is whether marks from judgements made about the same work presented on computer and on paper are comparable. Generally the literature concerning the on-screen marking of tests and examinations suggests that on-paper and on-screen scores are indeed comparable (e.g., Bennett, 2003; Greatorex, 2004), although Fowles and Adams (2005) report that differences have been found in studies by Whetton and Newton (2002), Sturman and Kispal (2003) and Royal-Dawson (2003). Our concern in this discussion article is that even when double marking studies find high levels of agreement between marks for the same work judged in different modes, issues of validity might be masked. We are thinking of validity in terms of the cognitive processes of the assessor when reaching a judgement, and how well these reflect the judgements that were intended when the assessment was devised.

A cognitive psychological exploration of the GCSE marking process

Suto, I. and Greatorex, J. (2006). A cognitive psychological exploration of the GCSE marking process. Research Matters: A Cambridge Assessment publication, 2, 7-11.

GCSEs play a crucial role in secondary education throughout England and Wales, and the process of marking them, which entails extensive human judgement, is a key determinant in the futures of many sixteen-year-olds. The aims of our study were to investigate the cognitive strategies used when marking GCSEs and to interpret them within the context of psychological theories of human judgement.

Two GCSE examinations were considered: an intermediate tier Mathematics paper, which used a ‘points-based’ marking scheme, and a foundation tier Business Studies paper, which used a ‘levels-based’ scheme. For each subject, a group of six experienced examiners marked four identical script samples each. The first three of these samples were marked silently. Whilst marking the fourth sample, the examiners were asked to ‘think aloud’ concurrently. Using a semi-structured interview schedule, the examiners were later questioned about their marking experiences retrospectively.

A qualitative analysis of the verbal protocol data enabled us to propose a tentative model of marking, which includes five distinct cognitive marking strategies: matching, scanning, evaluating, scrutinising, and no response. These strategies were broadly validated not only in the retrospective interviews with the participating examiners, but also by other senior mathematics and business studies examiners.

An empirical exploration of human judgement in the marking of school examinations
Greatorex, J. & Suto, I. (2006) International Association for Educational Assessment (IAEA) Conference, Singapore

2005

Assessing the evidence: different types of NVQ evidence and their impact on reliability and fairness.
Greatorex, J. (2005) Journal of Vocational Education and Training 57, 2, 149-264
What goes through a marker’s mind? Gaining theoretical insights into the A-level and GCSE marking process
Greatorex, J. and Suto, I. (2005). Paper presented at the Association for Educational Assessment (AEA) - Europe, Dublin, Republic of Ireland, 3 November 2005.
A review of research about writing and using grade descriptors in GCSEs and A levels

Greatorex, J. (2005). A review of research about writing and using grade descriptors in GCSEs and A levels. Research Matters: A Cambridge Assessment publication, 1, 9-11.

This article describes current awarding practice and reviews literature about writing and using grade descriptors for GCSEs and A levels. Grade descriptors are descriptions of the qualities anticipated at various levels of a candidates’ performance in an assessment. It is concluded that it is good practice to write grade descriptors based on empirical evidence. Grade descriptors for different domains and types of questions can be written by:

1) identifying questions where there is a statistically significant difference between the performance of students who achieve adjacent grades (e.g. A and B);
2) using Kelly’s Repertory Grid to interview examiners about the qualities which distinguish performance at these grades;
3) including these distinguishing qualities in grade descriptors.

Furthermore, there is little research about how grade descriptors are used, or could be used, in preparing pupils for assessments, and there is room for further research in this area.

What goes through an examiner’s mind? Using verbal protocols to gain insights into the GCSE marking process
Suto, I. and Greatorex, J. (2005) British Educational Research Association (BERA) Annual Conference
Judging learners’ work on screen. How valid and fair are assessment judgements?
Johnson, M. and Greatorex, J. (2005) British Educational Research Association (BERA) Annual Conference
Moderated e-portfolio project evaluation
Greatorex, J. (2005) Moderated e-portfolio project evaluation. Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.

2004

Does the gender of examiners influence their marking?
Greatorex, J. and Bell, J. F. (2004) Research in Education, 71, 25-36
From Paper to Screen: some issues on the way
Raikes, N., Greatorex, J. and Shaw, S. (2004). Presented at the 30th annual conference of the International Associations for Educational Assessment (IAEA), Philadelphia, USA, 13-18 June 2004.
What makes marking reliable? Experiments with UK examinations.
Baird, J., Greatorex, J. and Bell, J. F. (2004) Assessment in Education Principles, Policy and Practice, 11, 3, 331-348

2003

Developing and applying level descriptors
Greatorex, J. (2003) Westminster Studies in Education, 26, 2, 125-133
Examinations and assessment in curriculum 2000.
Greatorex, J. (2003) In: L. Le Versha and G. Nicholls (eds.)Teaching at post-16: Effective teaching in the A level, AS and VCE curriculum. London: Kogan Page
A Comparability Study in GCE A level Chemistry Including the Scottish Advanced Higher Grade
Greatorex, J., Hamnett, L. and Bell J. F. (2003) A review of the examination requirements and a report on the cross moderation exercise. [A study based on the Summer 2002 Examinations and organised by the Research and Evaluation Division, UCLES for OCR on behalf of the Joint Council for General Qualifications].
What happened to limen referencing? An exploration of how the Awarding of public examinations has been and might be conceptualised
Greatorex, J. (2003) British Educational Research Association (BERA) Annual Conference
How can NVQ assessors’ judgements be standardised?
Greatorex, J. and Shannon, M. (2003) British Educational Research Association (BERA) Annual Conference

2002

Writing and using level descriptors
Greatorex, J. (2002) Learning and Skills Research Journal, 6, 1, 36
A Comparability Study in GCE AS Chemistry Including parts of the Scottish Higher Grade Examinations
Greatorex, J., Elliott, G. and Bell, J. F. (2002) A review of the examination requirements and a report on the cross moderation exercise. [A study based on the Summer 2001 Examination and organised by the Research and Evaluation Division, UCLES for OCR on behalf of the Joint Council for General Qualifications].
A fair comparison? The evolution of methods of comparability in national assessment
Elliott, G. and Greatorex, J. (2002) Educational Studies, 28, 3, 253-264
Back to the future: A methodology for comparing old A level and new AS standards
Elliott, G., Greatorex, J., Forster, M., and Bell, J.F. (2002) Educational Studies, 28, 2, 163-180
What makes a senior examiner?
Greatorex, J. and Bell, J F. (2002) British Educational Research Association (BERA) Annual Conference
Two heads are better than one: Standardising the judgements of National Vocational Qualification assessors
Greatorex, J. (2002) British Educational Research Association (BERA) Annual Conference
Does the gender of examiners influence their marking?
Greatorex, J. and Bell, J F. (2002) Learning Communities and Assessment Cultures: Connecting Research with Practice
Tools for the trade: What makes GCSE marking reliable?
Greatorex, J., Baird, J. and Bell, J F. (2002) Learning Communities and Assessment Cultures: Connecting Research with Practice

2001

Making the grade - developing grade profiles for accounting using a discriminator model of performance
Greatorex, J., Johnson, C. and Frame, K. (2001) Westminster Studies in Education, 24, 2, 167-181
Can vocational A levels be meaningfully compared with other qualifications?
Greatorex, J. (2001) British Educational Research Association (BERA) Annual Conference

2000

A Review of Research into Levels, Profiles and Comparability
Bell, J.F. and Greatorex, J. (2000) QCA [QCA]
An accessible analytical approach for investigating what happens between the rounds of a Delphi study
Greatorex, J. and Dexter, T. (2000) Journal of Advanced Nursing Studies, 32, 4, 1016-1024
Application of Number: an investigation into a theoretical framework for understanding the production and reproduction of pedagogical practices
McAlpine, M. and Greatorex, J. (2000) British Educational Research Association (BERA) Annual Conference
What research could an Awarding Body carry out about NVQs?
Greatorex. J. (2000). British Educational Research Association (BERA) Annual Conference.
Is the glass half full or half empty? What examiners really think of candidates’ achievement
Greatorex, J. (2000) British Educational Research Association (BERA) Annual Conference

1999

Generic Descriptors - a Health Check
Greatorex, J. (1999) Quality in Higher Education, 5, 2, 155-165
The Implementation of Application of Number
McAlpine, M. and Greatorex, J. (1999) British Educational Research Association (BERA) Annual Conference
The Application of Number Experience
McAlpine, M. and Greatorex, J. (1999) Researching Work and Learning, A First International Conference, School of Continuing Education

Research Matters

Research Matters 32 promo image

Research Matters is our free biannual publication which allows us to share our assessment research, in a range of fields, with the wider assessment community.