Research Matters 29

Contents

  • Research Matters 29 - Foreword

    Oates, T. (2020). Foreword. Research Matters: A Cambridge Assessment publication, 29, 1.

    Download

  • Research Matters 29 - Editorial

    Bramley, T. (2020). Editorial. Research Matters: A Cambridge Assessment publication, 29, 1.

    Download

  • Accessibility in GCSE Science exams - Students's perspectives

    Crisp, V. and Macinska, S. (2020). Accessibility in GCSE Science exams - Students's perspectives. Research Matters: A Cambridge Assessment publication, 29, 2-10.

    As part of continued efforts to ensure inclusivity in assessment, OCR has developed a set of accessibility principles for question design in GCSE Science examinations, which has been applied since 2018. The principles are intended to help ensure that all students can demonstrate their knowledge, understanding and skills to the best of their ability. The aim of this research was to consider the effectiveness of the accessibility principles by investigating students’ perceptions of question features in terms of accessibility. Two versions of a short test were constructed using questions with and without the accessibility principles applied. Students in Year 11 (aged 15 to 16 years old) from four schools across England attempted the test and, of these, 57 were interviewed afterwards. Students were asked about question design features relating to the different accessibility principles and encouraged to talk about how accessible they felt the questions were and why. The results revealed that for most of the question features explored in this study, students’ perceptions of accessibility tended to align with expected effects. However, for three accessibility themes, the findings were neutral or mixed. 

    Download

  • A framework for describing comparability between alternative assessments

    Shaw, S. D., Crisp, V. and Hughes, S. (2020). A framework for describing comparability between alternative assessments. Research Matters: A Cambridge Assessment publication, 29, 17-22.

    The credibility of an Awarding Organisation’s products is partly reliant upon the claims it makes about its assessments and on the evidence it can provide to support such claims. Some such claims relate to comparability. For example, for syllabuses with options, such as the choice to conduct coursework or to take an alternative exam testing similar skills, there is a claim that overall candidates’ results are comparable regardless of the choice made. This article describes the development and piloting of a framework that can be used, concurrently or retrospectively, to evaluate the comparability between different assessments that act as alternatives. The framework is structured around four types of assessment standards and is accompanied by a recording form for capturing declared comparability intentions and for evaluating how well these intentions have been achieved. The framework and recording form together are intended to: 

    • provide a structure for considering comparability in terms of four established assessment standards

    • afford an opportunity for test developers to consider their intentions with respect to the comparability claims they wish to make

    • provide a list of factors (within each assessment standard) that are likely to contribute to the comparability of two alternative assessments 

    • give a structure for collecting a body of relevant information against these factors 

    • prompt an evaluation (on the part of the test developer) of how effectively the claims have been met

    Download

  • Using corpus linguistics tools to identify instances of low linguistic accessibility in tests

    Beauchamp, D. and Constantinou, F. (2020). Using corpus linguistics tools to identify instances of low linguistic accessibility in tests. Research Matters: A Cambridge Assessment publication, 29, 10-16.

    Assessment is a useful process as it provides various stakeholders (e.g., teachers, parents, government, employers) with information about students' competence in a particular subject area. However, for the information generated by assessment to be useful, it needs to support valid inferences. One factor that can undermine the validity of inferences from assessment outcomes is the language of the assessment material. For example, the use of excessively complex grammar and difficult vocabulary in the formulation of test questions may prevent students from displaying their true knowledge and skills (e.g., students who are not native speakers of the target language). In an attempt to support teachers and test developers in designing linguistically accessible assessment material, this study explored practical ways of investigating the linguistic complexity of test questions both at the level of vocabulary (lexical complexity) and grammar (syntactic complexity). The study compiled three corpora of examination questions and undertook automated lexical and syntactic analyses of these questions using software packages that are typically employed in the field of corpus linguistics.

    Download

  • Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests

    Bramley, T. (2020). Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests. Research Matters: A Cambridge Assessment publication, 29, 23-27.

    The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating).  As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level of correlation between simulated expert judgements of item difficulty and empirical difficulty. For small-sample equating with 90 examinees per test, more accurate equating arose from using simple random sampling compared to cluster sampling at the same sample size. The overall equating error depended on where on the mark scale the cut-score was located.  The simulations based on a realistic value for the correlation between judged and empirical difficulty (0.6) produced a similar overall error to small-sample equating with cluster sampling.  Simulations of standard-setting based on a very optimistic correlation of 0.9 had the lowest error of all.

    Download

  • How useful is comparative judgement of item difficulty for standard maintaining?

    Benton, T. (2020). How useful is comparative judgement of item difficulty for standard maintaining? Research Matters: A Cambridge Assessment publication, 29, 27-35.

    This article reviews the evidence on the extent to which experts’ perceptions of item difficulties, captured using comparative judgement, can predict empirical item difficulties. This evidence is drawn from existing published studies on this topic and also from statistical analysis of data held by Cambridge Assessment. Having reviewed the evidence, the article then proposes a simple mechanism by which such judgements can be used to equate different tests, and evaluates the likely accuracy of the method. 

    Download

  • Research News

    Peigne, A. (2020). Research News. Research Matters: A Cambridge Assessment publication, 29, 37-39. 

    A summary of recent conferences and seminars, Statistics Reports, Data Bytes, and research articles published since the last issue of Research Matters.

    Download

Data Bytes

A regular series of graphics from our research team, highlighting the latest research findings and trends in education and assessment.