Tom Bramley

Tom Bramley

I joined Cambridge University Press and Assessment’s Research Division in 1995, and since that time have worked on projects covering most aspects of the assessment process, such as trying to understand the factors that make exam questions more or less difficult, or the features of mark schemes that make exam questions easier or harder to mark accurately. Much of my work has involved investigating the role that expert judgment and statistical information can play in mapping grading standards from one exam to another.

My current research interests include the application of Comparative Judgment methods to assessment, and trying to exploit item-level data from past exams to help set grade boundaries on new exams.

I hold an MA in Experimental Psychology from the University of Oxford, and an MSc. in Operational Research from Lancaster University.

Outside of work I enjoy chess, tennis, gardening, and playing the piano.

Publications

2024

Reporting scale scores at GCSE and A level

Bramley, T., Vidal Rodeiro, C.L., & Wilson, F. (2024). Reporting scale scores at GCSE and A level. Cambridge University Press & Assessment.

Research Matters 37 - Editorial

Bramley, T. (2024). Editorial.Research Matters: A Cambridge University Press & Assessment publication, 37, 5.

Our first article describes the technologies collectively known as "extended reality" and considers opportunities and challenges for using them in teaching and assessing mathematics. The second reports on a study where three undergraduates were asked to use ChatGPT to assist with writing essays and then interviewed about their approach. Our third article considers the difficult issues that arise when comparing curriculum documents with the aim of making claims about comparability of different curricula. Our fourth article explores the extent to which data (specifically whether a response was missing or not) can support inferences about whether students were under time pressure in paper-based GCSE examination components, and whether exams in some subjects were more "speeded" than others. Our final article presents a historical overview of the Centre for Evaluation and Monitoring (CEM), acquired by Cambridge in 2019 but now celebrating more than 40 years since its creation.

Research Matters 37 : Spring 2024
  • Foreword Tim Oates
  • Editorial Tom Bramley
  • Extended Reality (XR) in mathematics assessment: A pedagogical visionXinyue Li
  • Does ChatGPT make the grade?Jude Brady, Martina Kuvalja, Alison Rodrigues, Sarah Hughes
  • How do approaches to curriculum mapping affect comparability claims? An analysis of mathematics curriculum content across two educational jurisdictionsNicky Rushton, Dominika Majewska, Stuart Shaw
  • Exploring speededness in pre-reform GCSEs (2009 to 2016)Emma Walland
  • A Short History of the Centre for Evaluation and Monitoring (CEM)Chris Jellis
  • Research NewsLisa Bowett

2023

Research Matters 36 - Editorial

Bramley, T. (2023). Editorial. Research Matters: A Cambridge University Press & Assessment publication, 36, 5.

Our first article traces the record of assessments in natural history and related subjects (e.g., Botany, Zoology, Environmental Science) from the Cambridge University Press & Assessment archives. Our second article explores the effect of the reforms to GCSE Mathematics in England on progression to, and achievement in, post-16 mathematics. Our third article reports on the practical application of work on error in assessment materials in terms of redesigned checklists used in OCR for different professional roles in the question paper production process. Our fourth article looks at the relationship between the Cambridge Checkpoint tests taken at the end of lower secondary (around age 14) in some international schools, and subsequent performance on the Cambridge IGCSE (taken at around age 16). Our fifth article presents an analysis of the challenges of “synchronous hybrid teaching” based on in-depth interviews with 12 teachers from six different European countries.

Research Matters 36: Autumn 2023
  • Foreword Tim Oates
  • Editorial Tom Bramley
  • The prevalence and relevance of Natural History assessments in the school curriculum, 1858–2000: a study of the Assessment ArchivesGillian Cooke
  • The impact of GCSE maths reform on progression to mathematics post-16Carmen Vidal Rodeiro, Joanna Williamson
  • An example of redeveloping checklists to support assessors who check draft exam papers for errorsSylvia Vitello, Victoria Crisp, Jo Ireland
  • An analysis of the relationship between Secondary Checkpoint and IGCSE resultsTim Gill
  • Synchronous hybrid teaching: how easy is it for schools to implement?Filio Constantinou
  • Research NewsLisa Bowett
Research Matters 35 - Editorial

Bramley, T. (2023). Editorial. Research Matters: A Cambridge University Press & Assessment publication, 35, 5.

The Covid-19 pandemic and its aftermath have prompted a lot of debate about the purpose of education and the role of assessment. All the articles in this issue touch more or less directly on these big themes.

Research Matters 35: Spring 2023
  • Foreword Tim Oates
  • Editorial Tom Bramley
  • Creating Cambridge Learner Profiles: A holistic framework for teacher insights from assessments and evaluationsIrenka Suto
  • A conceptual approach to validating competence frameworksSimon Child, Stuart Shaw
  • Teachers’ and students’ views of access arrangements in high stakes examinationsCarmen Vidal Rodeiro, Sylwia Macinska
  • Who controls what and how? A comparison of regulation and autonomy in the UK nations’ education systemsPia Kreijkes, Martin Johnson
  • Assessment in England at a crossroads: which way should we go?Tony Leech
  • Research NewsLisa Bowett

2022

What’s in a name? Are surnames derived from trades and occupations associated with lower GCSE scores?

Williamson, J., & Bramley, T. (2022). What’s in a name? Are surnames derived from trades and occupations associated with lower GCSE scores? Research Matters: A Cambridge University Press & Assessment publication, 34, 76–97.

In England, there are persistent associations between measures of socio-economic advantage and educational outcomes. Research on the history of names, meanwhile, confirms that surnames in England – like many other countries – were highly socially stratified in their origins. These facts prompted us to wonder whether educational outcomes in England might show variation by surname origin, and specifically, whether surnames with an occupational origin might be associated with slightly lower average GCSE scores than surnames of other origins. Even though surnames do not measure an individual’s socio-economic position, our hypothesis was that in aggregate, the educational outcomes of a group defined in this way might still reflect past social history.

In line with the research hypothesis, the results showed that the mean GCSE scores of candidates with occupational surnames were slightly lower than the mean GCSE scores of candidates with other surnames. The difference in attainment was a similar size to the difference expected between candidates half a year apart in age, and much smaller than the “gap” between male and female candidates. The explanation for the identified effect was beyond the scope of the current research, but surname effect mechanisms proposed in the literature include the psychological (e.g., implicit egotism), sociological and socio-genetic.

Research Matters 34 - Editorial

Bramley, T. (2022). Editorial. Research Matters: A Cambridge University Press & Assessment publication, 34, 5.

The first article in this issue is another contribution to the large amount of research on the impact of the COVID-19 pandemic on education, as perceived by teachers. Our second article is more technical but right at the heart of assessment: how to maintain or link standards from one version of a test or exam to another. Our third article is an interesting exploration of a large data set from Cambridge CEM’s BASE assessment. Our fourth article reflects on the concept of “recovery curricula” developed in response to educational disruption. The final article is a bit of a departure from our usual fare. We investigated whether there are any systematic differences in the exam results of groups of students with different categories of surname and found a small effect in line with our hypothesis: average grades of candidates with “occupational” surnames were slightly lower than those in other categories.

Research Matters 34: Autumn 2022
  • Foreword Tim Oates
  • Editorial Tom Bramley
  • Learning loss in the Covid-19 pandemic: teachers’ views on the nature and extent of loss Matthew Carroll, Filio Constantinou
  • Which assessment is harder? Some limits of statistical linking Tom Benton, Joanna Williamson
  • Progress in the first year at school Chris Jellis
  • What are "recovery curricula" and what do they include? A literature review Martin Johnson
  • What's in a name? Are surnames derived from trades and occupations associated with lower GCSE scores? Joanna Williamson, Tom Bramley
  • Research News Lisa Bowett
Research Matters 33: Spring 2022
  • Foreword Tim Oates
  • Editorial Tom Bramley
  • A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries Tom Benton, Tim Gill, Sarah Hughes, Tony Leech
  • How do judges in Comparative Judgement exercises make their judgements? Tony Leech, Lucy Chambers
  • Judges' views on pairwise Comparative Judgement and Rank Ordering as alternatives to analytical essay marking Emma Walland
  • The concurrent validity of Comparative Judgement outcomes compared with marks Tim Gill
  • How are standard-maintaining activities based on Comparative Judgement affected by mismarking in the script evidence? Joanna Williamson
  • Moderation of non-exam assessments: is Comparative Judgement a practical alternative? Carmen Vidal Rodeiro, Lucy Chambers
  • Research News Lisa Bowett
Research Matters 33 - Editorial - the CJ landscape

Bramley, T. (2022). Editorial. Research Matters: A Cambridge University Press & Assessment publication, 33, 5.

Eleven years ago in Research Matters, Bramley & Oates (2011) described the “state of play” regarding research into Comparative Judgement (CJ). At the time it was still being referred to as a “new” method, at least in terms of its application in educational assessment. (The technique of paired comparisons in psychology has been around since the 19th century!) It is still not a mainstream technique, but much more is now known about its strengths and weaknesses. In this editorial we give an overview of what we see as the current CJ landscape and some of the key research questions and practical issues.

2021

Metaphors and the psychometric paradigm
Bramley, T. (2021, November 3 - 5). Metaphors and the psychometric paradigm. [Paper presentation]. Annual conference of the Association for Educational Assessment – Europe (AEA-Europe), Dublin, Republic of Ireland (online).
Research Matters 32 - Editorial

Bramley, T. (2021). Editorial. Research Matters: A Cambridge University Press & Assessment publication, 32, 5.

Our first article in this issue looks at what is meant by the term social studies. Our second and third articles come from a collaboration with our researchers in Cambridge CEM (part of Cambridge University Press & Assessment since June 2019). They look at the impact of the drastic changes to school life created by the lockdowns imposed to manage the pandemic. Our fourth article looks at definitions of error in other industries and relates them to the perceptions and understanding of error among those with different roles in producing exam papers at Cambridge University Press & Assessment. Our final article shows that it is important to avoid "wishful thinking" when anticipating the benefits to reliability that adaptive testing might bring, in particular if tests made up of the kind of questions currently used in GCSEs and A Levels were to be administered adaptively.

Research Matters 32: Autumn 2021
  • Foreword Tim Oates
  • Editorial Tom Bramley
  • Learning during lockdown: How socially interactive were secondary school students in England? Joanna Williamson, Irenka Suto, John Little, Chris Jellis, Matthew Carroll
  • How well do we understand wellbeing? Teachers’ experiences in an extraordinary educational era Chris Jellis, Joanna Williamson, Irenka Suto
  • What do we mean by question paper error? An analysis of criteria and working definitions Nicky Rushton, Sylvia Vitello, Irenka Suto
  • Item response theory, computer adaptive testing and the risk of self-deception Tom Benton
  • Research News Anouk Peigne
Research Matters 31 - Editorial

Bramley, T. (2021). Editorial. Research Matters: A Cambridge Assessment publication, 31, 5.

Welcome to the new online version of Research Matters. The first issue in our new format is a special issue devoted to research relating to the COVID-19 pandemic. In the UK the exams were cancelled in summer 2020, and in all four UK nations students were eventually awarded the better of a Centre Assessed Grade (CAG) and a grade produced by an “algorithm” which statistically standardised the CAGs with the dual aims of compensating for differences among schools in how harsh or generous their estimated grades were, and minimising overall grade inflation. The experience raised, and continues to raise, fundamental questions about fairness, standards, reliability, the meaning of grades, the purpose of assessment—and others. In this special issue we touch on many of these issues.

Research Matters 31: Spring 2021
  • Foreword Tim Oates, CBE
  • Editorial Tom Bramley
  • Attitudes to fair assessment in the light of COVID-19 Stuart Shaw, Isabel Nisbet
  • On using generosity to combat unreliability Tom Benton
  • A guide to what happened with Vocational and Technical Qualifications in summer 2020 Sarah Mattey
  • Early policy response to COVID-19 in education—A comparative case study of the UK countries Melissa Mouthaan, Martin Johnson, Jackie Greatorex, Tori Coleman, Sinead Fitzsimons
  • Generation Covid and the impact of lockdown Gill Elliott
  • Disruption to school examinations in our past Gillian Cooke, Gill Elliott
  • Research News Anouk Peigne

2020

Research Matters 30 - Editorial

Bramley, T. (2020). Editorial. Research Matters: A Cambridge Assessment publication, 30, 1.

The first article gives a fascinating historical glimpse of exam board activity in 1938 in the months leading up to the start of the Second World War. The second article describes some of the metaphors used to understand curriculum design (spirals, networks, webs), and some of the arguments about which are most useful in different felds of knowledge. Staying with the theme of curricula, the third article describes the high-level principles that should be considered when developing curricula for learners in emergency situations where normal educational provision is disrupted by (for example) war or natural disasters. The fourth article gives an account of a detailed investigation of the experiences of question writers in writing and reviewing questions in an on-screen environment. The final article shows how different taxonomies of skills and knowledge developed for general academic contexts can be evaluated and deployed in more applied contexts.

Research Matters 30: Autumn 2020
  • Foreword Tim Oates, CBE
  • Editorial Tom Bramley
  • A New Cambridge Assessment Archive Collection Exploring Cambridge English Exams in Germany and England in JPLO Gillian Cooke
  • Perspectives on curriculum design: comparing the spiral and the network models Jo Ireland, Melissa Mouthaan
  • Context matters—Adaptation guidance for developing a local curriculum from an international curriculum framework Sinead Fitszimons, Victoria Coleman, Jackie Greatorex, Hiba Salem, Martin Johnson
  • Setting and reviewing questions on-screen: issues and challenges Victoria Crisp, Stuart Shaw
  • A way of using taxonomies to demonstrate that applied qualifcations and curricula cover multiple domains of knowledge Irenka Suto, Jackie Greatorex, Sylvia Vitello, Simon Child
  • Research News Anouk Peigne
Should we be banking on it? Exploring potential issues in the use of ‘item’ banking with structured examination questions
Crisp, V., Shaw, S. and Bramley, T.  (2020) Should we be banking on it? Exploring potential issues in the use of ‘item’ banking with structured examination questions. Assessment in Education: Principles, Policy & Practice (ahead of print).
Research Matters 29: Spring 2020
  • Foreword Tim Oates, CBE
  • Editorial Tom Bramley
  • Accessibility in GCSE Science exams – Students' perspectives Victoria Crisp and Sylwia Macinska
  • Using corpus linguistic tools to identify instances of low linguistic accessibility in tests David Beauchamp, Filio Constantinou
  • A framework for describing comparability between alternative assessments Stuart Shaw, Victoria Crisp, Sarah Hughes
  • Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests Tom Bramley
  • How useful is comparative judgement of item difficulty for standard maintaining? Tom Benton
  • Research News Anouk Peigne
Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests

Bramley, T. (2020). Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests. Research Matters: A Cambridge Assessment publication, 29, 23-27.

The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating).  As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level of correlation between simulated expert judgements of item difficulty and empirical difficulty. For small-sample equating with 90 examinees per test, more accurate equating arose from using simple random sampling compared to cluster sampling at the same sample size. The overall equating error depended on where on the mark scale the cut-score was located.  The simulations based on a realistic value for the correlation between judged and empirical difficulty (0.6) produced a similar overall error to small-sample equating with cluster sampling.  Simulations of standard-setting based on a very optimistic correlation of 0.9 had the lowest error of all.

Research Matters 29 - Editorial

Bramley, T. (2020). Editorial. Research Matters: A Cambridge Assessment publication, 29, 1.

Writing good exam questions is a difficult art. The first two articles in this issue are about accessibility. The third article describes a rigorous but practical approach that could help practitioners to investigate the comparability of alternative assessments. The final two articles explore an issue that is of perennial interest to assessment developers – namely the extent to which expert judgement about the difficulty of exam questions can give useful information about the relative difficulty of two exams as experienced by the examinees.

Metaphors and the psychometric paradigm
Bramley, T. (2020). Metaphors and the psychometric paradigm. Assessment in Education: Principles, Policy & Practice, 27(2), 178-191.

2019

Spoilt for choice? Is it a good idea to let students choose which questions they answer in an exam?
Bramley, T., and Crisp, V. (2019). Spoilt for choice? Is it a good idea to let students choose which questions they answer in an exam? Presented at the 20th AEA-Europe Conference, Lisbon, Portugal, 13-16 November 2019.
Research Matters 28: Autumn 2019
  • Foreword Tim Oates, CBE
  • Editorial Tom Bramley
  • Which is better: one experienced marker or many inexperienced markers? Tom Benton
  • "Learning progressions": A historical and theoretical discussion Tom Gallacher, Martin Johnson
  • The impact of A Level subject choice and students' background characteristics on Higher Education participation Carmen Vidal Rodeiro
  • Studying English and Mathematics at Level 2 post-16: issues and challenges Jo Ireland
  • Methods used by teachers to predict final A Level grades for their students Tim Gill
  • Research News David Beauchamp
Research Matters 28 - Editorial

Bramley, T. (2019). Editorial. Research Matters: A Cambridge Assessment publication, 28, 1.

How should we define what is the ‘correct’ mark to give the response to an exam question (and the paper as a whole)? This fundamental question is addressed by Tom Benton in the first article. The second article by Tom Gallacher and Martin Johnson takes a critical look at how some of the recent literature about ‘learning progressions’ fits into the larger picture of academic thinking about teaching, learning and curriculum design. The third article by Carmen Vidal Rodeiro presents some key findings from a larger study exploring what HE courses are taken, and at what kinds of HE institution, by students with different subject choices at A Level. There has been a lot of debate recently about the merits or otherwise of making students who do not achieve a grade C or 4 at GCSE in English or Maths continue to study these subjects as part of their post-16 curriculum, considered in our fourth article. In our final article, Tim Gill reports on a survey of a relatively small number of schools in three different subject areas aimed at finding out how they went about making their predictions of A Level results for individual students,

The art of test construction: Can you make a good Physics exam by selecting questions from a bank?

Bramley, T., Crisp, V. and Shaw, S. (2019). The art of test construction: Can you make a good Physics exam by selecting questions from a bank? Research Matters: A Cambridge Assessment publication, 27, 2-8.

In the traditional approach to constructing a GCSE or A Level examination paper, a single person writes the whole paper. In some other contexts, tests are constructed by selecting questions from a bank of questions. In this research, we asked experts to evaluate the quality of Physics exam papers constructed in the traditional way, constructed by expert selection of items from a bank, and constructed by computer selection of items from a bank. Anecdotal evidence suggested a “compilation” process would be detrimental to the quality of this kind of exam. We wanted to test whether in fact assessment experts could distinguish between tests that had been created in the traditional way, and those that had been compiled by selection from a bank, when they were unaware of the method of construction.

The effect of adaptivity on the reliability coefficient in adaptive comparative judgement.

Bramley, T. and Vitello, S. (2019). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement.  Assessment in Education: Principles, Policy and Practice, 26(1), 43-58.

Spoilt for choice? Issues around the use and comparability of optional exam questions
Bramley, T. and Crisp, V. (2019). Spoilt for choice? Issues around the use and comparability of optional exam questions. Assessment in Education: Principles, Policy and Practice, 26(1), 75-90.

2018

Should we be banking on it? Exploring potential issues in the use of 'item' banking with structured examination questions
Crisp, V., Bramley, T. and Shaw, S. (2018). Should we be banking on it? Exploring potential issues in the use of 'item' banking with structured examination questions. Presented at the 19th annual AEA-Europe conference, Arnhem/Nijmegen, The Netherlands, 7-10 November 2018.
Evaluating the 'similar items method' for standard maintaining
Bramley, T. (2018). Evaluating the 'similar items method' for standard maintaining. Presented at the 19th annual AEA-Europe conference, Arnhem/Nijmegen, The Netherlands, 7-10 November 2018.
When can a case be made for using fixed pass marks?

Bramley, T. (2018). When can a case be made for using fixed pass marks? Research Matters: A Cambridge Assessment publication, 25, 8-13.

Using fixed pass marks (e.g., “To pass you must gain 70% of the available marks”) has many attractions in some assessment contexts, (e.g., on-demand testing).  The obvious drawback to using fixed pass marks is that it does not allow for the fact that test forms may vary in difficulty despite best efforts to construct or design them to be similar.  The aims of the research described in this article were to investigate how serious a problem this might be in practice, and to explore the extent to which it could be alleviated by using expert judgement in the test construction process.

2017

The effect of adaptivity on the reliability coefficient in comparative judgement
Vitello, S. and Bramley, T. (2017). Presented at the annual conference of the Association for Educational Assessment - Europe, Prague, 9-11 November 2017.
Comparing small-sample equating with Angoff judgment for linking cut-scores on two tests
Bramley, T. and Benton, T. (2017). Presented at the 18th annual AEA Europe conference, Prague, 9-11 November 2017.
Some thoughts on the ‘Comparative Progression Analysis’ method for investigating inter-subject comparability
Benton, T. and Bramley, T. (2017). Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.
Some implications of choice of tiering model in GCSE mathematics for inferences about what students know and can do
Bramley, T. (2017). Some implications of choice of tiering model in GCSE mathematics for inferences about what students know and can do. Research in Mathematics Education, 19(2), 163-179.
Handbook of test development - review of Section 2
Bramley, T. (2017). Handbook of test development - review of Section 2.  Assessment in Education: Principles, Policy and Practice, 24(4), 519-522.

2016

Investigating experts' perceptions of examination question demand
Bramley, T. (2016). Paper presented at the AEA-Europe annual conference, Limassol, Cyprus, 3-5 November 2016
The effect of subject choice on the apparent relative difficulty of different subjects

Bramley, T. (2016). The effect of subject choice on the apparent relative difficulty of different subjects. Research Matters: A Cambridge Assessment publication, 22, 23-26.

Periodically there is interest in whether some GCSE and A level subjects are more ‘difficult’ than others.  Because students choose which subjects they take from a large pool of possible subjects, the matrix of data to be analysed contains a large amount of non-random missing data – the grades of students in subjects that they did not take.  This makes the calculation of statistical measures of relative subject difficulty somewhat problematic.  It is also likely to make subjects that measure something different to the majority of other subjects appear easier.  These two claims are illustrated in this article with a simple example using simulated data.

Maintaining test standards by expert judgement of item difficulty

Bramley, T. and Wilson, F. (2016). Maintaining test standards by expert judgement of item difficulty. Research Matters: A Cambridge Assessment publication, 21, 48-54.

This article describes two methods for using expert judgments about test items to arrive at a cut-score (grade boundary) on a new test where none of the items has been pre-tested.  The first method required experts to estimate the mean score on the new items from examinees at the cut-score, basing their judgments on statistics from items judged to be similar on previous tests.  The second method only required them to identify previous items that they deemed effectively identical in terms of difficulty.  Both methods were applied to an AS Chemistry unit.  Both methods gave results close to the actual cut-scores, but with the first method this may have been fortuitous since there were quite large differences between the judges’ individual results.  The results from the second method were quite stable when the criteria for defining effectively identical items were varied, suggesting this method may be more suitable in practice.

2015

The reliability of Adaptive Comparative Judgment
Bramley, T. and Wheadon, C. (2015) Paper presented at the AEA-Europe annual conference, Glasgow, Scotland, 4-7 November 2015
Maintaining standards by expert judgment of question difficulty
Bramley, T. and Wilson, F. (2015) Paper presented at the AEA-Europe annual conference, Glasgow, Scotland, 4-7 November 2015
Gender differences in GCSE
Bramley, T., Vidal Rodeiro, C.L. and Vitello, S. (2015) Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.
Volatility in exam results
Bramley, T. and Benton, T. (2015) Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.
Investigating the reliability of Adaptive Comparative Judgment
Bramley, T. (2015) Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.
The use of evidence in setting and maintaining standards in GCSEs and A levels

Benton, T. and Bramley, T. (2015). Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.

2014

Evaluating the adjacent levels model for differentiated assessment
Bramley, T. (2014) Paper presented at the AEA-Europe annual conference Tallinn, Estonia, 5-8 November 2014
Multivariate representations of subject difficulty

Bramley, T. (2014) Multivariate representations of subject difficulty. Research Matters: A Cambridge Assessment publication, 18, 42-47.

This study compared the Kelly method, the Rasch partial credit model, and multidimensional scaling (MDS) for representing similarities and differences in the grade distributions of different A level subjects. Although there were differences in the patterns across the MDS representations depending on which index was used to measure similarity among A levels, at a broad level the same findings were observed – that is, STEM subjects, Languages, and Humanities clustered together fairly well in the 2-D and 3-D representations, Expressive and Applied subjects less well. However, the conclusion was that a 2-D plot of difficulty against fit to the Rasch model is the most informative way of visually representing the different subjects.

On the limits of linking: experiences from England
Bramley, T., Dawson, A., & Newton, P. (2014) Paper presented at the 76th annual meeting of the National Council on Measurement in Education (NCME), Philadelphia, PA, 2-6 April 2014
Using statistical equating for standard maintaining in GCSEs and A levels
Bramley, T. and Vidal Rodeiro, C.L. (2014) Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.

2013

Prediction matrices, choice and grade inflation
Bramley, T. (2013) Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment.
Maintaining standards in public examinations: why it is impossible to please everyone
Bramley, T. (2013) Paper presented at the 15th biennial conference of the European Association for Research in Learning and Instruction (EARLI), Munich, Germany, 27-31 August 2013
How accurate are examiners’ holistic judgements of script quality?
Gill, T. and Bramley, T. (2013). How accurate are examiners’ holistic judgements of script quality? Assessment in Education: Principles, Policy & Practice. 20(3), 308-324.
Problems in estimating composite reliability of ‘unitised’ assessments
Bramley, T., & Dhawan, V. (2013) Research Papers in Education. 28(1), 43-56

2012

Measurement and Construct need to be clarified first.
Bramley, T. (2012) Commentary on Newton, P.E. Clarifying the consensus definition of validity.  Measurement: Interdisciplinary Research and Perspectives. 10(1-2), 42-45.
What if the grade boundaries on all A level examinations were set at a fixed proportion of the total mark?
Bramley, T. (2012). Paper presented at the Maintaining Examination Standards seminar, London, 28 March 2012.

The effect of manipulating features of examinees' scripts on their perceived quality

Bramley, T. (2012). The effect of manipulating features of examinees' scripts on their perceived quality. Research Matters: A Cambridge Assessment publication, 13, 18-26.

Expert judgment of the quality of examinees’ work plays an important part in standard setting, standard maintaining, and monitoring of comparability.  In order to understand and validate methods that rely on expert judgment, it is necessary to know what features of examinees’ work influence the experts’ judgments.  The controlled experiment reported here investigated the effect of changing four features of scripts from a GCSE Chemistry examination: i) the quality of written English; ii) the proportion of missing as opposed to incorrect responses; iii) the profile of marks in terms of fit to the Rasch model; and iv) the proportion of marks gained on the subset of questions testing 'good Chemistry'.  Expert judges ranked scripts in terms of perceived quality.  There were two versions of each script, an original version and a manipulated version (with the same total mark) where one of the four features had been altered.  The largest effect was obtained by a combination of iii) and iv): increasing the proportion of marks gained on ‘good Chemistry’ items, and increasing the number of correct answers to difficult questions at the expense of wrong answers to easy questions. The implications of the findings for operational standard maintaining procedures are discussed.

2011

Subject difficulty - the analogy with question difficulty

Bramley, T. (2011). Subject difficulty - the analogy with question difficulty. Research Matters: A Cambridge Assessment publication, Special Issue 2, 27-33. 

This article explores in depth one particular way of defining and measuring subject difficulty - the 'IRT approach'. First the IRT approach is briefly described. Then the analogy of using the IRT approach when the ‘items’ are examination subjects is explored. Next the task of defining difficulty from first principles is considered, starting from the simplest case of comparing two dichotomous items within a test. Finally, an alternative to the IRT approach, based on producing visual representations of differences in difficulty among just a few (three or four) examinations, is offered as an idea for future exploration.

Editorial

Bramley, T. (2011). Editorial. Research Matters: A Cambridge Assessment publication, Special Issue 2, 2.

In this Special Issue of Research Matters we present some of Cambridge Assessment’s recent thinking about comparability. The opening article gives an historical overview of comparability concerns showing how they have been expressed in different political and educational contexts in England over the last 100 years. The second article identifies and defines some widely used terms and shows how different methods of investigating comparability can be related to different definitions. The third article tries to find evidence to support the popular (mis)-conception that A levels used to be norm-referenced but became criterion-referenced, and that this change was responsible for the rising pass rate. Another topic of recurring interest is whether, within a qualification type (e.g. GCSE or A level), subjects differ in difficulty. It always seems to have been easier to calculate indices of relative subject difficulty than to explain exactly what they mean. A recent approach has been to use the techniques of Item Response Theory, treating different exam subjects like different questions (items) on a test. The fourth article discusses whether this analogy works. It is an unavoidable fact of comparability research that often there is a need to compare things that are in many ways very different, such as vocational and academic qualifications. A sensible basis for comparison needs to be found, and the fifth article discusses one such basis – ‘returns to qualifications’ – that has so far been relatively rarely used by researchers in awarding bodies. The sixth article discusses some of the conceptual issues involved in linking tests to the Common European Framework of Reference for Languages (CEFR). The seventh article describes some of the issues that arose, and research undertaken by OCR in order to develop guidelines for grading procedures in 2011 that would be capable of achieving comparability. The final article takes an interesting step away from the academic literature on comparability to discuss how comparability issues are presented in the media, and to evaluate the contribution that programmes like “That’ll Teach ’em” can make to our understanding of comparability and standards.

Investigating and reporting information about marker reliability in high-stakes external school examinations
Bramley, T. and Dhawan, V. (2011). Abstract of presentation at the annual European Conference on Educational Research (ECER), Berlin, Germany, September 2011.
Estimates of reliability at qualification level for GCSE and A level examinations
Bramley, T. and Dhawan, V. (2011). Paper presented at the British Educational Research Association annual conference, University of London Institute of Education, September 2011.
Assessment instruments over time

Elliott, G., Curcin, M., Johnson, N., Bramley, T., Ireland, J., Gill, T. & Black, B. Assessment instruments over time. Research Matters: A Cambridge University Press & Assessment publication, A selection of articles (2011) 2-4. First published in Research Matters, Issue 7, January 2009

As Cambridge Assessment celebrated its 150th anniversary in 2008 members of the Evaluation and Psychometrics Team looked back at question papers over the years. Details of the question papers and examples of questions were used to illustrate the development of seven subjects: Mathematics, Physics, Geography, Art, French, Cookery and English Literature. Two clear themes emerged from the work across most subjects - an increasing emphasis on real-world contexts in more recent years and an increasing choice of topic areas and question/component options available to candidates.

The interrelations of features of questions, mark schemes and examinee responses and their impact upon marker agreement.
Black, B., Suto, I., and Bramley, T. (2011) Assessment in Education: Principles, Policy and Practice (Special Issue), 18, 3, 295-318
The effect of changing component grade boundaries on the assessment outcome in GCSEs and A levels

Bramley, T. and Dhawan, V. (2011). The effect of changing component grade boundaries on the assessment outcome in GCSEs and A levels. Research Matters: A Cambridge Assessment publication, 12, 13-18.

GCSE and A level assessments are graded examinations, where grade boundaries are set on the raw mark scale of each of the units/components comprising the assessment. These boundaries are then aggregated in a particular way depending on the type of assessment to produce the overall grades for the assessment. This article reports a simple 'sensitivity analysis' determining the effect on assessment grade boundaries of varying the (judgementally set) key grade boundaries on the units/components by ±1 mark. Two assessments with different structures were used - a tiered ‘linear’ GCSE, and a 6-unit ‘modular’ A level.

Rank ordering and paired comparisons - the way Cambridge Assessment is using them in operational and experimental work

Bramley, T. and Oates, T. (2011). Rank ordering and paired comparisons - the way Cambridge Assessment is using them in operational and experimental work. Research Matters: A Cambridge Assessment publication, 11, 32-35.

In this article we describe the method of paired comparisons and its close relative, rank-ordering. Despite early origins, these scaling methods have been introduced into the world of assessment relatively recently, and have the potential to lead to exciting innovations in several aspects of the assessment process. Cambridge Assessment has been at the forefront of these developments and here we summarise the current ‘state of play'.

Estimates of reliability of qualifications
Bramley, T. and Dhawan, V. (2011) Ofqual, Ofqual/11/4826, Coventry

2010

Towards a suitable method for standard-maintaining in multiple-choice tests: capturing expert judgement of test difficulty through rank-ordering
Curcin, M., Black, B. & Bramley, T. (2010) Association for Educational Assessment (AEA) - Europe, Oslo
The interrelations of features of questions, mark schemes and examinee responses and their impact on marker agreement
Suto, I., Bramley, T. & Black, B. (2010) European Conference on Educational Research (ECER), Helsinki.
Evaluating the rank-ordering method for standard maintaining
Bramley, T. and Gill, T. (2010) Research Papers in Education, 25, 3, 293-317
'Key discriminators' and the use of item level data in reporting

Bramley, T. (2010). 'Key discriminators' and the use of item level data in reporting. Research Matters: A Cambridge Assessment publication, 9, 32-38.

As more examination papers in general qualifications (GCSEs and A levels) are scanned and marked on screen, the marks on individual questions or question parts are collected automatically, and are referred to as item level data (ILD).The analysis of ILD is available for use in awarding meetings (where the grade boundaries are decided). This article discusses the theoretical rationale for using ILD in awarding, presents some possible formats for displaying data, and suggests ways in which the data could be used in practice.

Locating objects on a latent trait using Rasch analysis of experts’ judgments
Bramley, T. (2010) Probabilistic models for measurement in education, psychology, social science and health, Copenhagen

2009

The effect of manipulating features of examinees' scripts on their perceived quality

Bramley, T. (2009). Paper presented at the AEA-Europe annual conference, Balzan, Malta, 5-7 November 2009.

Standard-maintaining by expert judgement: using the rank-ordering method for determining the pass mark on multiple-choice tests
Curcin, M., Black, B. and Bramley, T. (2009) British Educational Research Association (BERA) Annual Conference
Mark scheme features associated with different levels of marker agreement

Bramley, T. (2009). Mark scheme features associated with different levels of marker agreement. Research Matters: A Cambridge Assessment publication, 8, 16-23.

This research looked for features of question papers and mark schemes associated with higher and lower levels of marker agreement at the level of the item rather than the whole paper. First, it aimed to identify relatively coarse features of question papers and mark schemes that could apply across a wide range of subjects and be objectively coded by someone without particular subject expertise or examining experience. It then aimed to discover which features were most strongly related to marker agreement, to discuss any possible implications for question paper (QP) and mark scheme (MS) design, and to relate the findings to the theoretical framework summarised in Suto and Nadas (2007).

Assessment instruments over time

Elliott, G., Curcin, M., Bramley, T., Ireland, J., Gill, T. and Black, B. (2009). Assessment instruments over time. Research Matters: A Cambridge Assessment publication, 7, 23-25.

As Cambridge Assessment celebrated its 150th anniversary in 2008 members of the Evaluation and Psychometrics Team looked back at question papers over the years. Details of the question papers and examples of questions were used to illustrate the development of seven subjects: Mathematics, Physics, Geography, Art, French, Cookery and English Literature. Two clear themes emerged from the work across most subjects - an increasing emphasis on real-world contexts in more recent years and an increasing choice of topic areas and question/component options available to candidates.

2008

Alternative approaches to National Assessment at KS1, KS2 and KS3
Green, S., Bell, J.F., Oates, T. and Bramley, T. (2008)
Assessment Instruments over Time
Elliott, G., Black, B. Ireland, J., Gill, T., Bramley, T., Johnson, N. and Curcin, M. (2008) International Association for Educational Assessment (IAEA) Conference, Cambridge
Mark scheme features associated with different levels of marker agreement
Bramley, T. (2008). British Educational Research Association (BERA) Annual Conference.
How accurate are examiners’ judgments of script quality?
Gill, T. & Bramley, T. (2008) British Educational Research Association (BERA) Annual Conference
Investigating a judgemental rank-ordering method for maintaining standards in UK examinations
Black, B., & Bramley, T. (2008). Research Papers in Education, 23(3), 357-373.
Using simulated data to model the effect of inter-marker correlation on classification consistency

Gill, T. and Bramley, T. (2008). Using simulated data to model the effect of inter-marker correlation on classification consistency. Research Matters: A Cambridge Assessment publication, 5, 29-36.

The marking of exam papers is never going to be 100% reliable unless all exams consist entirely of multiple-choice or other completely objective questions. Different opinions on the quality of the work or different interpretations of the mark schemes create the potential for candidates to receive a different mark depending on which examiner marks their paper. Of more concern for candidates is the potential for candidates to receive a different grade from a different examiner. The purpose of this study was to use simulated data to estimate the extent to which examinees might get a different grade for: i) different levels of correlation between markers and ii) for different grade bandwidths.

Alternative Approaches to National Assessment at KS1, KS2 and KS3
Green, S., Bell, J. F., Oates, T. and Bramley, T. (2008)

2007

Paired comparison methods
Bramley, T. (2007) In: P. Newton, J. Baird, H. Goldstein, H. Patrick, and P. Tymms (Eds.), Techniques for monitoring the comparability of examination standards, 246-294. London: QCA
Quantifying marker agreement: terminology, statistics and issues

Bramley, T. (2007). Quantifying marker agreement: terminology, statistics and issues. Research Matters: A Cambridge Assessment publication, 4, 22-28.

One challenge facing assessment agencies is in choosing the appropriate statistical indicators of marker agreement for communicating to different audiences. This task is not made easier by the wide variety of terminology in use, and differences in how the same terms are sometimes used. The purpose of this article is to provide a brief overview of: i) the different terminology used to describe indicators of marker agreement; ii) some of the different statistics which are used and; iii) the issues involved in choosing an appropriate indicator and its associated statistic. It is hoped that this will clarify some ambiguities which are often encountered, and contribute to a more consistent approach in reporting research in this area.

Quality control of examination marking

Bell, J. F., Bramley, T., Claessen, M. J. A. and Raikes, N. (2007). Quality control of examination marking. Research Matters: A Cambridge Assessment publication, 4, 18-21.

As markers trade their pens for computers, new opportunities for monitoring and controlling marking quality are created. Item-level marks may be collected and analysed throughout marking. The results can be used to alert marking supervisors to possible quality issues earlier than is currently possible, enabling investigations and interventions to be made in a more timely and efficient way. Such a quality control system requires a mathematical model that is robust enough to provide useful information with initially relatively sparse data, yet simple enough to be easily understood, easily implemented in software and computationally efficient – this last is important given the very large numbers of candidates assessed by Cambridge Assessment and the need for rapid analysis during marking. In the present article we describe the models we have considered and give the results of an investigation into their utility using simulated data.

2006

Quality control of marking: Some models and simulations
Bell, J.F., Bramley, T., Claessen, M.J.A. and Raikes, N. (2006). Presented at the 32nd annual conference of the International Association for Educational Assessment (IAEA), Singapore, 21-26 May 2006.
Equating methods used in KS3 Science and English
Bramley, T. (2006) NAA technical seminar, Oxford

2005

Accessibility, easiness and standards
Bramley, T. (2005) Educational Research, 47, 2, 251-261
Accessibility, easiness and standards

Bramley, T. (2005). Accessibility, easiness and standards. Research Matters: A Cambridge Assessment publication, 1, 6-7.

This article is a summary of an article published in Educational Research in 2005.  Discussions about whether one year’s test is easier or more difficult than the previous year’s test can often get bogged down when the spectre of ‘accessibility’ raises its head. Is a ‘more accessible’ test the same as an ‘easier’ test? Are there any implications for where the cut-scores should be set if a test is deemed to be more accessible, as opposed to more easy? Is there any way to identify questions which are ‘inaccessible’?  The main purpose of the article was to use a psychometric approach to attempt to answer these questions.

A rank-ordering method for equating tests by expert judgement

Bramley, T. (2005). A rank-ordering method for equating tests by expert judgement. Research Matters: A Cambridge Assessment publication, 1, 7-8.

This article is a summary of an article published in the Journal of Applied Measurement in 2005. It builds on much research carried out at UCLES over the past ten years on the use of judgements in scale construction.  It introduces an extension of Thurstone's paired comparison method to rankings of more than two objects, in the context of mapping a cut-score from one test to another.

A Rank-Ordering Method for Equating Tests by Expert Judgement
Bramley, T. (2005) Journal of Applied Measurement, 6, 2, 202-223

2001

The Question Tariff Problem in GCSE Mathematics
Bramley, T. (2001) Evaluation and Research in Education, 15, 2, 95-107
MVAT 2000 - Statistical Report
Bramley, T. (2001)

1998

The effects of structure on the demands in GCSE and A Level questions
Pollitt, A., Hughes, S., Ahmed, A., Fisher-Hoch, H. and Bramley, T. (1998) London: QCA.
Assessing changes in standards over time using Thurstone Paired Comparisons
Bramley, T., Bell, J.F., and Pollitt, A. (1998) Education Research and Perspectives, 25, 2, 1-23
Investigating A-level mathematics standards over time
Bell, J.F., Bramley, T. and Raikes, N. (1998).  Investigating A level mathematics standards over time. British Journal of Curriculum and Assessment, 8, 2, 7-11.

1997

What makes GCSE examination questions difficult? Outcomes of manipulating difficulty of GCSE questions
Fisher-Hoch, H., Hughes, S. and Bramley, T. (1997) British Educational Research Association (BERA) Annual Conference
Standards in A level Mathematics 1986-1996
Bell J F., Bramley, T. and Raikes, N. (1997). Presented at the British Educational Research Association (BERA) annual conference, York, UK, 11-14 September 1997.

Research Matters

Research Matters 32 promo image

Research Matters is our free biannual publication which allows us to share our assessment research, in a range of fields, with the wider assessment community.