Victoria Crisp

2024

A structured discussion of the fairness of GCSE and A level grades in England in summer 2020 and 2021.

Crisp, V., Elliott, G., Walland, E., & Chambers, L. (2024). A structured discussion of the fairness of GCSE and A level grades in England in summer 2020 and 2021. Research Papers in Education.

2023

An example of redeveloping checklists to support assessors who check draft exam papers for errors

Vitello, S., Crisp, V., & Ireland, J. (2023). An example of redeveloping checklists to support assessors who check draft exam papers for errors Research Matters: A Cambridge University Press & Assessment publication, 36, 46-58. https://doi.org/10.17863/CAM.101744

Assessment materials must be checked for errors before they are presented to candidates. Any errors have the potential to reduce validity. For example, in the most extreme cases, an error may turn an otherwise well-designed exam question into one that is impossible to answer. In Cambridge University Press & Assessment, assessment materials are checked by multiple assessment specialists across different stages during assessment development. While human checkers are critical to this process, we must acknowledge that there is ample research showing the shortcomings of being human (e.g., we have cognitive biases, and memory and attentional limitations). It is important to provide assessment checkers with tools that help overcome or mitigate these limitations.

This article is about one type of checking tool – checklists. We describe a research-informed, collaborative project to support assessors in performing their checks of exam papers. This project focused on redesigning the instructional, training and task materials provided to assessors. A key part of this was to design checklists for assessors to use when performing their checks. In this article, we focus primarily on the approach that we took for these checklists in order to draw readers’ attention to the complexity that is involved in designing them and to provide a practical example of how research can be used strategically to inform key design decisions.

Research Matters 36: Autumn 2023

Foreword Tim Oates
Editorial Tom Bramley
The prevalence and relevance of Natural History assessments in the school curriculum, 1858–2000: a study of the Assessment ArchivesGillian Cooke
The impact of GCSE maths reform on progression to mathematics post-16Carmen Vidal Rodeiro, Joanna Williamson
An example of redeveloping checklists to support assessors who check draft exam papers for errorsSylvia Vitello, Victoria Crisp, Jo Ireland
An analysis of the relationship between Secondary Checkpoint and IGCSE resultsTim Gill
Synchronous hybrid teaching: how easy is it for schools to implement?Filio Constantinou
Research NewsLisa Bowett

The appliance of science: exploring the use of context in reformed GCSE science examinations

Crisp, V. & Greatorex, J. (2023). The appliance of science: exploring the use of context in reformed GCSE science examinations. Assessment in Education: Principles, Policy & Practice

2022

A structure for analysing features of digital assessments that may affect the constructs assessed

Crisp, V. & Ireland, J. (2022). A structure for analysing features of digital assessments that may affect the constructs assessed. Cambridge University Press & Assessment.

Creating better tests: Students views on the accessibility of different exam question design features

Crisp, V. & Macinska, S. (2022, July 14-17). Creating better tests: Students’ views on the accessibility of different exam question design features. Paper presented at the 10th European Conference on Education, UCL, London, UK and online.

2020

Writing and reviewing assessment questions on-screen: issues and challenges

Crisp. V. and Shaw, S., D. (2020). Writing and reviewing assessment questions on-screen: issues and challenges. Research Matters: A Cambridge Assessment publication, 30, 19-26.

For assessment contexts where both a paper-based test and an on-screen assessment are available as alternatives, it is still common for the paper-based test to be prepared first with questions later transferred into an on-screen testing platform. One challenge with this is that some questions cannot be transferred. One solution might be for questions to be drafted into the on-screen testing platform and later converted for the paper-based test. This research investigated the issues that might arise if question writers drafted questions directly into the on-screen testing platform and if questions were reviewed directly in the platform.

Six assessors with experience of setting and reviewing questions took part. After some familiarisation, each participant attended a research meeting where they drafted some questions into an on-screen testing platform and reviewed some questions in the platform. After each activity, participants completed a workload questionnaire and were interviewed.

The findings suggest that training and support would be important. Participants reported feeling restricted when setting items. Evidence suggested that setters may avoid certain item types, write shorter questions than normal or write less creative questions. Setting was also reported to be slower. However, these issues could reduce with greater experience. Overall, it seems that it would be possible for setters to create at least some of their questions within an on-screen testing platform. However, care would be needed to mitigate frustration, ensure question quality and ensure representation of all relevant constructs.

Research Matters 30: Autumn 2020

Foreword Tim Oates, CBE
Editorial Tom Bramley
A New Cambridge Assessment Archive Collection Exploring Cambridge English Exams in Germany and England in JPLO Gillian Cooke
Perspectives on curriculum design: comparing the spiral and the network models Jo Ireland, Melissa Mouthaan
Context matters—Adaptation guidance for developing a local curriculum from an international curriculum framework Sinead Fitszimons, Victoria Coleman, Jackie Greatorex, Hiba Salem, Martin Johnson
Setting and reviewing questions on-screen: issues and challenges Victoria Crisp, Stuart Shaw
A way of using taxonomies to demonstrate that applied qualifcations and curricula cover multiple domains of knowledge Irenka Suto, Jackie Greatorex, Sylvia Vitello, Simon Child
Research News Anouk Peigne

Should we be banking on it? Exploring potential issues in the use of ‘item’ banking with structured examination questions

Crisp, V., Shaw, S. and Bramley, T. (2020) Should we be banking on it? Exploring potential issues in the use of ‘item’ banking with structured examination questions. Assessment in Education: Principles, Policy & Practice (ahead of print).

Consultation regarding a potential GCSE in Natural History: Early findings

Cambridge Assessment (2020). Consultation regarding a potential GCSE in Natural History: Early findings. Cambridge, UK: Cambridge Assessment.

The Learning Passport: Curriculum Framework (Maths, Science, Literacy).

Cambridge Assessment. (2020). The Learning Passport: Curriculum Framework (Maths, Science, Literacy). Cambridge, UK: Cambridge Assessment.

Research Matters 29: Spring 2020

Foreword Tim Oates, CBE
Editorial Tom Bramley
Accessibility in GCSE Science exams – Students' perspectives Victoria Crisp and Sylwia Macinska
Using corpus linguistic tools to identify instances of low linguistic accessibility in tests David Beauchamp, Filio Constantinou
A framework for describing comparability between alternative assessments Stuart Shaw, Victoria Crisp, Sarah Hughes
Comparing small-sample equating with Angoff judgement for linking cut-scores on two tests Tom Bramley
How useful is comparative judgement of item difficulty for standard maintaining? Tom Benton
Research News Anouk Peigne

A framework for describing comparability between alternative assessments

Shaw, S. D., Crisp, V. and Hughes, S. (2020). A framework for describing comparability between alternative assessments. Research Matters: A Cambridge Assessment publication, 29, 17-22.

The credibility of an Awarding Organisation’s products is partly reliant upon the claims it makes about its assessments and on the evidence it can provide to support such claims. Some such claims relate to comparability. For example, for syllabuses with options, such as the choice to conduct coursework or to take an alternative exam testing similar skills, there is a claim that overall candidates’ results are comparable regardless of the choice made. This article describes the development and piloting of a framework that can be used, concurrently or retrospectively, to evaluate the comparability between different assessments that act as alternatives. The framework is structured around four types of assessment standards and is accompanied by a recording form for capturing declared comparability intentions and for evaluating how well these intentions have been achieved. The framework and recording form together are intended to:

• provide a structure for considering comparability in terms of four established assessment standards

• afford an opportunity for test developers to consider their intentions with respect to the comparability claims they wish to make

• provide a list of factors (within each assessment standard) that are likely to contribute to the comparability of two alternative assessments

• give a structure for collecting a body of relevant information against these factors

• prompt an evaluation (on the part of the test developer) of how effectively the claims have been met

Accessibility in GCSE Science exams - Students's perspectives

Crisp, V. and Macinska, S. (2020). Accessibility in GCSE Science exams - Students's perspectives. Research Matters: A Cambridge Assessment publication, 29, 2-10.

As part of continued efforts to ensure inclusivity in assessment, OCR has developed a set of accessibility principles for question design in GCSE Science examinations, which has been applied since 2018. The principles are intended to help ensure that all students can demonstrate their knowledge, understanding and skills to the best of their ability. The aim of this research was to consider the effectiveness of the accessibility principles by investigating students’ perceptions of question features in terms of accessibility. Two versions of a short test were constructed using questions with and without the accessibility principles applied. Students in Year 11 (aged 15 to 16 years old) from four schools across England attempted the test and, of these, 57 were interviewed afterwards. Students were asked about question design features relating to the different accessibility principles and encouraged to talk about how accessible they felt the questions were and why. The results revealed that for most of the question features explored in this study, students’ perceptions of accessibility tended to align with expected effects. However, for three accessibility themes, the findings were neutral or mixed.

Research Matters Special Issue 3: An approach to validation - republished with Afterword

Foreword Gordon Stobart
Editorial Tim Oates
An approach to validation: Developing and applying an approach for the validation of general qualifications (First published 2012) Stuart Shaw, Victoria Crisp
Introduction
Approach to validity
Framework development
International A level context
Methodological overview
Constructing an interpretive argument for International A level Physics
Gathering evidence to construct a validity argument for International A level Physics : Evidence 1–18
Summary of A level Physics validation findings and evaluation of the argument
Conclusions
References
Afterword (January 2020) Stuart Shaw, Victoria Crisp
References

Afterword - Document

Shaw, S., & Crisp, V. (2020). Afterword. Research Matters: A Cambridge Assessment publication, Special Issue 3 (First published 2012), 44-45.

It has been eight years since the publication of this special issue exemplifying ‘An approach to validation’ (and closer to ten years since the work it describes was conducted).Validation studies continue to be demanding activities, not helped by considerable variety in views about what validation should involve, what it can achieve and whom it should serve (Newton & Shaw, 2016). One thing is clear, however. There is an increasing demand for awarding bodies to demonstrate the quality of their qualifications and meeting this demand is no mean feat. Our main motivation for publishing this work was to provide a practical example for would-be validators by describing the framework (based on Kane, 2006) and methods that we applied in a validation study of International A level Physics.

2019

Spoilt for choice? Is it a good idea to let students choose which questions they answer in an exam?

Bramley, T., and Crisp, V. (2019). Spoilt for choice? Is it a good idea to let students choose which questions they answer in an exam? Presented at the 20th AEA-Europe Conference, Lisbon, Portugal, 13-16 November 2019.

A framework for describing comparability between alternative assessments

Shaw, S., Crisp, V. and Hughes, S. (2019). A framework for describing comparability between alternative assessments. Presented at the Association for Educational Assessment in Europe Annual Conference, Lisbon, Portugal, 13th to 16th November 2019.

The art of test construction: Can you make a good Physics exam by selecting questions from a bank?

Bramley, T., Crisp, V. and Shaw, S. (2019). The art of test construction: Can you make a good Physics exam by selecting questions from a bank? Research Matters: A Cambridge Assessment publication, 27, 2-8.

In the traditional approach to constructing a GCSE or A Level examination paper, a single person writes the whole paper. In some other contexts, tests are constructed by selecting questions from a bank of questions. In this research, we asked experts to evaluate the quality of Physics exam papers constructed in the traditional way, constructed by expert selection of items from a bank, and constructed by computer selection of items from a bank. Anecdotal evidence suggested a “compilation” process would be detrimental to the quality of this kind of exam. We wanted to test whether in fact assessment experts could distinguish between tests that had been created in the traditional way, and those that had been compiled by selection from a bank, when they were unaware of the method of construction.

Spoilt for choice? Issues around the use and comparability of optional exam questions

Bramley, T. and Crisp, V. (2019). Spoilt for choice? Issues around the use and comparability of optional exam questions. Assessment in Education: Principles, Policy and Practice, 26(1), 75-90.

A question of quality: Conceptualisations of quality in the context of educational test questions

Crisp, V., Johnson, M. and Constantinou, F. (2019) A question of quality: Conceptualisations of quality in the context of educational test questions. Research in Education, 105 (1), 18-41.

2018

Should we be banking on it? Exploring potential issues in the use of 'item' banking with structured examination questions

Crisp, V., Bramley, T. and Shaw, S. (2018). Should we be banking on it? Exploring potential issues in the use of 'item' banking with structured examination questions. Presented at the 19th annual AEA-Europe conference, Arnhem/Nijmegen, The Netherlands, 7-10 November 2018.

Insights into teacher moderation of marks on high-stakes non-examined assessments

Crisp, V. (2018). Insights into teacher moderation of marks on high-stakes non-examined assessments. Research Matters: A Cambridge Assessment publication, 25, 14-20.

Where teachers assess their students’ work for high-stakes purposes, their judgements are standardised through professional discussions with their colleagues - a process often known as internal moderation. This process is important to the reliability of results as any inconsistencies in the marking standards applied by different teachers within a school department can be problematic.
This research used interviews, a questionnaire and observations of mock internal moderation sessions to explore internal moderation practices in the context of school-based work contributing to high-stakes assessments. Teachers’ discussions focused on the location and sufficiency of relevant evidence in student work. This, along with reference to the mark scheme and discussing the meaning of assessment criteria, is consistent with Cook and Brown’s (1999) notion of tacit knowledge being made explicit and helping to create and refine ways of knowing. Thus, internal moderation acts as professional development for teachers as well as providing quality assurance. Around a quarter of teachers appear not to have opportunities to participate in internal moderation. Moderation by teachers is reported to be infrequently influenced by group dynamics, is thought to remove any personal bias, and teachers tended to report that the process worked well.

2017

Exploring the relationship between validity and comparability in assessment

Crisp, V. (2017). Exploring the relationship between validity and comparability in assessment. London Review of Education, 15(3), 523-535.

Multiple voices in tests: towards a macro theory of test writing

Constantinou, F., Crisp, V. and Johnson, M. (2017). Multiple voices in tests: towards a macro theory of test writing. Cambridge Journal of Education, 48(8), 411-426.

How do question writers compose external examination questions? Question writing as a socio-cognitive process

Johnson, M., Constantinou, F. and Crisp, V. (2017). How do question writers compose external examination questions? Question writing as a socio-cognitive process. British Educational Research Journal (BERJ). 43(4), 700-719.

The judgement processes involved in the moderation of teacher-assessed projects.

Crisp, V. (2017). The judgement processes involved in the moderation of teacher-assessed projects. Oxford Review of Education 43(1), 19-37.

2016

How do question writers compose examination questions? Question writing as a socio-cognitive process

Johnson, M., Constantinou, F. and Crisp, V. (2016). Paper presented at the AEA-Europe annual conference, Limassol, Cyprus, 3-5 November 2016

'Question quality': The concept of quality in the context of exam questions

Crisp, V., Johnson, M. and Constantinou, F. (2016). Paper presented at the AEA-Europe annual conference, Limassol, Cyprus,3-5 November 2016

Writing questions for examination papers: a creative process?

Constantinou, F., Crisp, V. and Johnson, M. (2016). Paper presented at the 8th Biennial Conference of the European Association for Research in Learning and Instruction (EARLI) SIG 1 - Assessment and Evaluation, Munich, Germany, 24-26 August 2016

2015

Exploring the difficulty of mathematics examination questions for weaker readers

Crisp, V. (2015). Educational Studies, 41(3), 276-292.

Validity and comparability of assessment: how do these concepts relate?

Crisp, V. (2015) Paper presented at the British Educational Research Association (BERA) conference, Belfast, 15-17 September 2015

Reflections on a framework for validation – Five years on

Shaw, S. and Crisp, V. (2015). Reflections on a framework for validation – Five years on. Research Matters: A Cambridge Assessment publication, 19, 31-37.

In essence, validation is simple. The basic questions which underlie any validation exercise are: what is being claimed about the test, and are the claims warranted (given all of the evidence). What could be more straightforward? Unfortunately, despite a century of theorising validity, it is still quite unclear exactly how much and what kind of evidence or analysis is required in order to establish a claim to validity. Despite Kane’s attempts to simplify validation by developing a methodology to support validation practice, one which is grounded in argumentation (e.g., Kane, 1992), and the “simple, accessible direction for practitioners” (Goldstein & Behuniak, 2011, p.36) provided by the Standards (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education [AERA, APA, & NCME], 2014), good validation studies still prove surprisingly challenging to implement.

In response, a framework for evidencing assessment validity in large-scale, high-stakes examinations and a set of methods for gathering validity evidence was developed in 2008/2009. The framework includes a number of validation questions to be answered by the collection of appropriate evidence and by related analyses. Both framework and methods were piloted and refined. Systematic implementation of the validation framework followed which employs two parallel validation strategies:

1. an experimental validation strategy which entails full post-hoc validation studies undertaken solely by research staff

2. an operational validation strategy which entails the gathering and synthesis of validation evidence currently generated routinely within operational processes.

Five years on, a number of issues have emerged which prompted a review of the validation framework and several conceptual and textual changes to the language of the framework. These changes strengthen the theoretical structure underpinning the framework.

This paper presents the revised framework, and reflects on the original scope of the framework and how this has changed. We also consider the suitability and meaningfulness of the language employed by the framework.

2014

Evaluating assessments in the 21st century: Reflections on a framework for validation – 5 years on.

Crisp, V., and Shaw, S. (2014). Evaluating assessments in the 21st century. Reflections on a framework for validation - 5 years on. Presented at the 15th annual conference of the Association for Educational Assessment in Europe, Tallin, Estonia, 6-8 November 2014.

Judgement in the assessment of ‘harder to examine’ skills: what do assessors pay attention to?

Crisp, V. (2014). Judgement in the assessment of 'harder to examine' skills: what do assessors pay attention to? Presented at the 15th annual conference of the Association for Educational Assessment in Europe, Tallin, Estonia, 6-8 November 2014.

Cultural and societal factors in high-performing jurisdictions

Crisp, V. (2014). Cultural and societal factors in high-performing jurisdictions. Research Matters: A Cambridge Assessment publication, 17, 29-41.

This article aims to provide insights into some of the cultural and societal contextual factors that influence education systems, using a number of high-performing jurisdictions (HPJs) as case studies. Consideration of the education and assessment systems of HPJs around the world has become a strategy of some interest during education reform and/or development. However, it has been noted that when doing so, societal and cultural features of the jurisdictions need to be considered (e.g. Elliott and Phuong-Mai, 2008; Alexander, 2010; Oates, 2010; Barber, Donnelly and Rizvi, 2012). The effects of a particular educational system may well be influenced by such factors, and as a result the system of one jurisdiction will not necessarily transfer the educational and achievement benefits if simply replicated in the jurisdiction undergoing change.

This article has been written using various secondary sources such as relevant articles, books and reports, newspaper articles, blog posts and other online material. A number of researchers have previously summarised and analysed the features of HPJs, including some of the cultural factors, to identify the possible reasons for the high achievements of students (at least on some of the measures that have been influential, such as PISA, TIMSS and PIRLS). Such work, key examples being the work of the Center on International Education Benchmarking and the Organisation for Economic Development (OECD) produced book Lessons from PISA for the United States: Strong Performers and Successful Reformers in Education, was particularly useful to the current article.

Six jurisdictions were chosen as the focus for this exploration of cultural and societal factors. The focus jurisdictions were: Alberta (Canada), Shanghai (China), Hong Kong, Singapore,Victoria (Australia), and New Zealand. A few additional jurisdictions for which cultural issues of interest were also noted during the literature review for this article are also mentioned briefly.

2013

Teacher views on the effects of the change from coursework to controlled assessment in GCSEs

Crisp, V. & Green, S. (2013) Educational Research and Evaluation: An International Journal on Theory and Practice, 19(8), 680-699.

The judgement processes involved in the moderation of teacher-assessed projects in a national assessment.

Crisp, V. (2013). The judgement processes involved in the moderation of teacher-assessed projects in a national assessment. Presented at the 14th annual conference of the Association for Educational Assessment in Europe, Paris, 7-9 November 2013.

Modelling question difficulty in an A Level Physics examination

Crisp, V. & Grayson, R. (2013) Research Papers in Education, 28(3), 346–372.

Criteria, comparison and past experiences: How do teachers make judgements when marking coursework?

Crisp, V. (2013) Assessment in Education: Principles, Policy & Practice, 20(1), 127-144

2012

A framework for evidencing assessment validity in large-scale, high-stakes international examinations

Shaw, S., Crisp, V. and Johnson, N. (2012) Assessment in Education: Principles, Policy and Practice, 19(2), 159-176

Applying methods to evaluate construct validity in the context of A level assessment

Crisp, V. and Shaw, S. (2012) Educational Studies, 38(2), 209-222

Controlled assessments in 14-19 Diplomas: Implementation and effects on learning experiences

Crisp, V. and Green, S. (2012) Educational Research and Evaluation, 18(4), 333-351

An investigation of rater cognition in the assessment of projects

Crisp, V. (2012) Educational Measurement: Issues and Practice, 31(3), 10-20

The effects of features of examination questions on the performance of students with dyslexia

Crisp, V., Johnson, M. and Novakovic, N. (2012) British Educational Research Journal, 38(5), 813-839.

The effects of the change from coursework to controlled assessment in GCSEs.

Crisp, V., and Green, S. (2012). The effects of the change from coursework to controlled assessment in GCSEs. Presented at the Annual Conference of the British Educational Research Association, University of Manchester, 4-6 September 2012.

The teacher as examiner: How do teachers make judgements when marking coursework?

Crisp, V. (2012). The teacher as examiner: How do teachers make judgements when marking coursework? Presented at the Annual Conference of the British Educational Research Association, University of Manchester, 4-6 September 2012.

2011

Exploring features that affect the difficulty and functioning of science exam questions for all candidates and specifically for those with reading difficulties

Crisp, V. (2011) Irish Educational Studies, 30, 3, 323-343

Item difficulty modelling: exploring the usefulness of this technique in a European context

Hopkin, R. and Crisp, V. Paper presented at the AEA-Europe annual conference, Belfast, November 2011.

Translating validation research into everyday practice: issues facing an international awarding body

Shaw, S. and Crisp, V. (2011). Paper presented at the 12th Annual Conference of the Association for Educational Assessment in Europe, Belfast, 10-12 November 2011.

The judgement processes involved in the assessment of project work by teachers

Crisp, V. (2011). Paper presented at the 12th Annual Conference of the Association for Educational Assessment in Europe, Belfast, 10-12 November 2011.

Modelling question difficulty in an A level Physics examination

Crisp, V. and Hopkin, R. (2011). British Educational Research Association, London

How valid is A level Physics? A wide-ranging evaluation of the validity of Physics A level assessments

Crisp, V. and Shaw, S. (2011). Paper presented at the British Educational Research Association annual conference, University of London Institute of Education, September 2011.

Practical issues in early implementation of the Diploma Principal Learning

Crisp, V. and Green, S. (2011). Practical issues in early implementation of the Diploma Principal Learning. Research Matters: A Cambridge Assessment publication, 12, 10-13.

This short article reports on some of the findings from an interview study conducted in the first year of implementation of the 14–19 Diplomas. The Diplomas were introduced by the Labour government as part of wider educational reforms (DfES, 2005a, 2005b). They were designed to prepare young people for the world of work or for independent study, and are intended to combine theoretical and applied learning, to provide different ways of learning, to encourage students to develop skills valued by employers and universities, and provide opportunities for students to apply skills to work situations in realistic contexts. They are also intended to contribute to ensuring that a wide range of appropriate learning pathways are available to young people, thus facilitating increased participation and attainment. The Diplomas are available at Levels 1, 2 and 3 and rather than being taught by an individual school or college, they are available through consortia consisting of a small group of schools and/or colleges working collaboratively. The Diploma is a composite qualification which is made up of the following elements: principal learning; generic learning; additional and specialist learning.

The current research focused on the Principal Learning (PL). The Principal Learning components are specific to a domain or ‘line of learning’. Learning through experience of simulated or real work contexts, through applying and practically developing skills, as well as theoretical learning, is emphasised. The PL components are assessed predominantly via assignments which are internally marked and externally moderated. Teaching of Diplomas in the first five ‘lines of learning’ began in September 2008 with a further five beginning in September 2009 and four in September 2010.

Six consortia running Phase 1 Diplomas in the first year of implementation took part in this research. At each consortium, one or more teachers and (in all but one case) a number of learners were interviewed about the learning that was occurring and various practicalities around implementation of the Diploma. This article reports on the latter.

An investigation of rater cognition in the assessment of projects

Crisp. V. (2011) American Educational Research Association (AERA) Annual Meeting, New Orleans

Tracing the evolution of validity in educational measurement: past issues and contemporary challenges

Shaw, S. and Crisp, V. (2011). Tracing the evolution of validity in educational measurement: past issues and contemporary challenges. Research Matters: A Cambridge Assessment publication, 11, 14-17.

Validity is not a simple concept in the context of educational measurement. Measuring the traits or attributes that a student has learnt during a course is not like measuring an objective property such as length or weight; measuring educational achievement is less direct. Yet, educational outcomes can have high stakes in terms of consequences (e.g., affecting access to further education), thus the validity of assessments is highly important.

Tracing this trajectory of evolution, particularly through key documents such as the validity/validation chapter in editions of Educational Measurement (Cureton, 1951; Cronbach, 1971; Messick, 1989; Kane, 2006) and the Standards of Educational and Psychological Testing (AERA, APA and NCME, 1954/1955, 1966, 1974, 1985, 1999) has been important to us as part of work to develop an approach to validation for general assessments.

The concept of validity is not a new one. Conceptualisations of validity are apparent in the literature from around the turn of the twentieth century, and since that time, they have evolved significantly. Earliest perceptions of validity were that of a static property captured by a single statistic, usually an index of the correlation of test scores with some criterion (Binet, 1905; Pearson, 1896; Binet and Henri, 1899; Spearman, 1904). Through various re-conceptualisations, contemporary validity theory generally sees validity as about the appropriateness of the inferences and uses made from assessment outcomes, including some consideration of the consequences of test score use. This article traces the progress and changes in the theorisation of validity over time and the issues that led to these changes.

2010

Towards a model of the judgement processes involved in examination marking

Crisp, V. (2010) Oxford Review of Education, 26, 1, 1-21

Judging the grade: exploring the judgement processes involved in examination grading decisions

Crisp, V. (2010) Evaluation and Research in Education, 23, 1, 19-35

How valid are A levels? Findings from a multi-method validation study of an international A level in geography

Shaw, S. and Crisp, V. (2010) Association for Educational Assessment (AEA) - Europe, Oslo

A new model of assessment for 14 to 19 year olds: What do students and their teachers think of Diploma assessments?

Crisp, V. and Green, S. (2010) Association for Educational Assessment (AEA) - Europe, Oslo

The effects of controlled assessments in the new Diplomas on students' learning experiences

Crisp, V. and Green, S. (2010) A paper presented at the Chartered Institute of Educational Assessors Annual Conference, London, October 2010.

How hard can it be? Issues and challenges in the development of a validation method for traditional written examinations

Crisp, V. and Shaw, S. (2010) International Association for Educational Assessment (IAEA) Conference, Bangkok

Developing and piloting a framework for the validation of A levels

Shaw, S. and Crisp, V. (2010). Developing and piloting a framework for the validation of A levels. Research Matters: A Cambridge Assessment publication, 10, 44-47.

Validity is a key principle of assessment, a central aspect of which relates to whether the interpretations and uses of test scores are appropriate and meaningful (Kane, 2006). For this to be the case, various criteria must be achieved, such as good representation of intended constructs, and avoidance of construct irrelevant variance. Additionally, some conceptualisations of validity include consideration of the consequences that may result from the assessment, such as effects on classroom practice. The kinds of evidence needed may vary depending on the intended uses of assessment outcomes. For example, if assessment results are designed to be used to inform decisions about future study or employment, it is important to ascertain that the qualification acts as suitable preparation for this study or employment, and to some extent predicts likely success.

This article reports briefly on the development, piloting and revision of a framework and methodology for validating general academic qualifications such as A levels. The development drew on previously proposed frameworks for validation from the literature, and the resulting framework and set of methods were piloted with International A level Geography. This led to revisions to the framework and use with A level Physics.

2009

Does assessing project work enhance the validity of qualifications? The case of GCSE coursework

Crisp, V. (2009) Educate, 9, 1, 16-26

Are all assessments equal? The comparability of demands of college-based assessments in a vocationally-related qualification

Crisp, V. and Novakovic, N. (2009) Research in Post-Compulsory Education, 14, 1, 1-18

Is this year's exam as demanding as last year's? Using a pilot method to evaluate the consistency of examination demands over time

Crisp, V. and Novakovic, N. (2009) Evaluation and Research in Education, 22, 1, 3-15

Using data from on-screen marking to consider the difficulty and functioning of mathematics examination questions for weaker readers

Crisp, V. (2009) AEA Europe, Malta

A proposed framework for evidencing assessment validity in large-scale, high-stakes international examinations

Shaw, S., Crisp, V. & Johnson, N. (2009) Association for Educational Assessment (AEA) - Europe, Malta

An exploration of the effect of pre-release examination materials on classroom practice in the UK

Johnson, M. and Crisp, V. (2009) Research in Education, 82, 47-59

What was this student doing? Evidencing validity in A level assessments

Shaw, S. and Crisp, V. (2009) British Educational Research Association (BERA) Annual Conference

Objective questions in science GCSE: Exploring question difficulty, item functioning and the effect of reading difficulties

Crisp, V. (2009) British Educational Research Association (BERA) Annual Conference

2008

Improving students’ capacity to show their knowledge, understanding and skills in exams by using combined question and answer papers

Crisp, V. (2008) Research Papers in Education, 23, 1, 69-84

Exploring the nature of examiner thinking during the process of examination marking

Crisp, V. (2008) Cambridge Journal of Education, 38, 2, 247-264

Tales of the expected: the influence of students’ expectations on question validity and implications for writing exam questions

Crisp, V., Sweiry, E., Ahmed, A. and Pollitt, A. (2008) Educational Research, 50, 1, 95-115

The validity of using verbal protocol analysis to investigate the processes involved in examination marking

Crisp, V. (2008) Research in Education, 79, 1-12

The Development of a Formative Scenario-Based Computer Assisted Assessment Tool in Psychology for Teachers: The PePCAA Project

Crisp, V. and Ward, C. (2008) Computers and Education, 50, 1509-1526

Judging the grade: an exploration of the judgement processes involved in A level examination grading decisions: BERA abstract

Crisp, V. (2008) British Educational Research Association (BERA) Annual Conference

Towards a methodology for evaluating the equivalency of demands in vocational assessments between colleges/training providers: IAEA abstract

Crisp, V. & Novakovic, N. (2008) International Association for Educational Assessment (IAEA) Conference, Cambridge

A case of positive washback: an exploration of the effect of pre-release examination materials on classroom practice: ECER abstract

Johnson, M. & Crisp, V. (2008) European Conference on Educational Research (ECER), Gothenburg

Are all assessments equal? The comparability of demands of college-based assessments in a vocationally-related qualification: BERA abstract

Crisp, V. and Novaković, N. (2008) British Educational Research Association (BERA) Annual Conference

Do assessors pay attention to appropriate features of student work when making assessment judgements?

Crisp, V. (2008). Do assessors pay attention to appropriate features of student work when making assessment judgements? Research Matters: A Cambridge Assessment publication, 6, 5-9.

It is via the judgements of appropriate experts that assessment decisions are made, yet the actual thought processes involved during marking or grading are under-researched. This article draws on a study of the cognitive and socially-influenced processes involved in marking and grading A level geography examinations and pilot research into the marking of GCSE coursework by teachers. This data was used to investigate whether assessors pay attention to appropriate features of student work.

Verbal protocols of assessors’ thinking aloud whilst marking and grading work were collected and measures of marker agreement were obtained. The protocols were analysed in detail using appropriate coding schemes. From the behaviours identified, a tentative model of the marking process was developed, within which features of student work affecting judgements and social and personal reactions were identified. Whilst many features that appeared to influence evaluations were clearly focussed on the criteria intended for evaluation, some were not and could have influenced evaluations. Reactions to language use or legibility (when not assessing communication), personal or emotional responses and social responses sometimes occurred before marking decisions. The article discusses whether such responses could explain variations in marks from different examiners.

A review of literature regarding the validity of coursework and the rationale for its inclusion in the GCSE

Crisp, V. (2008). A review of literature regarding the validity of coursework and the rationale for its inclusion in the GCSE. Research Matters: A Cambridge Assessment publication, 5, 20-24.

Coursework was included in many GCSEs from their introduction in 1988 to increase the validity of assessment by providing wider evidence of student work and to enhance pupil learning by valuing skills such as critical thinking and independent learning (SEC, 1985). As the Secondary Examinations Council put it ‘above all, the assessment of coursework can correspond much more closely to the scale of values in this wider world, where the individual is judged as much by his or her style of working and ability to cooperate with colleagues as by the eventual product’ (SEC, 1985, p. 6).

The validity and reliability of the assessment of GCSE coursework has come under much discussion since its introduction with the focus of concerns changing over time. At the inception of the GCSE, the main threats anticipated were possible unreliability of teacher marking, possible cheating and concern that girls were favoured (see QCA, 2006a). Now, concerns about consistency across similar subjects, fairness and authenticity (including the issues of internet plagiarism and excessive assistance from others), tasks becoming overly-structured (and hence reducing learning benefits) along with the overall burden on students across subjects, have led to a review of coursework by the Qualifications and Curriculum Authority (QCA).

This article reviews relevant literature using the stages of assessment described by Crooks, Kane and Cohen (1996) to structure discussion of possible improvements to the validity of assessment as a result of including a coursework element within GCSE specifications and possible threats to validity associated with coursework.

Investigating the judgemental marking process: an overview of our recent research

Suto, I., Crisp, V. and Greatorex, J. (2008). Investigating the judgemental marking process: an overview of our recent research. Research Matters: A Cambridge Assessment publication, 5, 6-9.

This article gives an overview of a number of linked studies which explored the process of marking GCSE and A-level examination questions from a number of different angles. Key aims of these studies were to provide insights into how examiner training and marking accuracy could be improved, as well as reasoned justifications for how item types could be assigned to different groups of examiners in the future. The research studies combined several approaches, exploring both the information that people attend to when marking items and the sequences of mental operations involved. Examples include studies that used the think-aloud method to identify the cognitive marking strategies entailed in marking student responses, or to explore the broader socio-cognitive influences on the marking process. Other examples explored the relationship between cognitive marking strategy complexity and marking accuracy.

This article brings together the findings from these various related studies to summarise the influences and processes that have been identified as important to the marking process from the research conducted so far.

2007

‘The demands of exam syllabuses and question papers’, in: P. Newton, J. Baird, H. Goldstein, H. Patrick, and P. Tymms (Eds.)

Pollitt, A., Ahmed, A. and Crisp, V. (2007) Techniques for monitoring the comparability of examination standards. London: QCA.

The use of annotations in examination marking: opening a window into markers’ minds

Crisp, V. and Johnson, M. (2007) British Educational Research Journal, 33(6), 943–961

The effects of features of GCSE questions on the performance of students with dyslexia

Crisp, V., Johnson, M. and Novakovic, N. (2007) British Educational Research Association (BERA) Annual Conference

Do assessors pay attention to appropriate features of student work when making assessment judgements?

Crisp, V. (2007) International Association for Educational Assessment (IAEA) Conference, Azerbaijan

Comparing the decision-making processes involved in marking between examiners and between different types of examination questions

Crisp, V. (2007) British Educational Research Association (BERA) Annual Conference

Researching the judgement processes involved in A-level marking

Crisp, V. (2007). Researching the judgement processes involved in A-level marking. Research Matters: A Cambridge Assessment publication, 4, 13-18.

The marking of examination scripts by examiners is a key part of the assessment process in many assessment systems. Despite this, there has been relatively little work to investigate the process of marking at a cognitive and socially-framed level. Improved understanding of the judgement processes underlying current assessment systems would also leave us better prepared to anticipate the likely effects of various innovations in examining systems such as moves to on-screen marking.

An AS level and an A2 level geography exam paper were selected. Six experienced examiners who usually mark at least one of the two papers participated in the research. Examiners marked fifty scripts from each exam at home with the marking of the first ten scripts for each reviewed by the relevant Principal Examiner. This reflected normal marking procedures as far as possible. Examiners later came to meetings individually where they marked four or five scripts in silence and four to six scripts whilst thinking aloud for each exam, and were also interviewed.

The findings of this research support the view that assessment involves processes of actively constructing meaning from texts as well as involving cognitive processes. The idea of examining as a practice that occurs within a social framework is supported by the evidence of some social, personal and affective responses. Aspects of markers’ social histories as examiners and teachers were evident in the comparisons that they made and perhaps more implicitly in their evaluations. The overlap of these findings with aspects of various previous findings helps to validate both current and previous research, thus aiding the continued development of an improved understanding of the judgement processes involved in marking.

2006

Examiners annotations: practice and purpose

Crisp, V. and Johnson, M. (2006). Examiners annotations: practice and purpose. Research Matters: A Cambridge Assessment publication, 2, 11-14.

The processes of reading and writing are recognised to be inextricably intertwined. Writing helps to support cognitive demands made upon the reader whilst processing a text (e.g., O’Hara, 1996; Benson, 2001). Examiners annotate scripts whilst marking (e.g., underlining, circling, using abbreviations or making comments) and this may reflect the cognitive support for comprehension building that annotations can provide. There is also some existing evidence that annotations might act as a communicative device in relation to accountability and that annotating might have a positive influence on markers’ perceptions and affect their feelings of efficacy.

This research investigated the use of annotations during marking and the role that annotations might be playing in the marking process. Six mathematics GCSE examiners and six business studies GCSE examiners who had previously been standardised to mark the paper were recruited. Examiners initially marked ten scripts which were then reviewed by their Team Leader. Examiners then marked a further 46 (Business Studies) or 40 (Mathematics) scripts.

The examiners later attended individual meetings with researchers. The session began with each examiner marking a small number of new scripts to re-familiarise themselves with the examination paper and mark scheme. A researcher then observed each examiner as they continued marking a few further scripts. Each examiner was interviewed about their use of annotations.

The findings portray a clear sense that markers in both subjects believed that annotating performed two distinct functions. The first appeared to be justificatory, communicating the reasons for their marking decisions to others. This mirrors the statutory requirements for awarding bodies to establish transparent, accountable procedures which ensure quality, consistency, accuracy and fairness. The second purpose was to support their thinking and marking decisions. In addition to helping markers with administrative aspects of marking (for example, keeping a running tally of marks), there are claims that annotations also support higher order reading comprehension processes.

Can a picture ruin a thousand words? The effects of visual resources in exam questions

Crisp, V. and Sweiry, E. (2006) Educational Research, 48, 2, 139-154

2005

Can a picture ruin a thousand words? The effects of visual resources and layout in examination questions

Crisp, V. and Sweiry, E. (2005). Can a picture ruin a thousand words? The effects of visual resources and layout in examination questions. Research Matters: A Cambridge Assessment publication, 1, 11-15.

Visual resources, such as pictures, diagrams and photographs, can sometimes influence students’ understanding of an examination question and their responses (Fisher-Hoch, Hughes and Bramley, 1997). If visual resources do have a disproportionately large influence on the development of mental models, this has implications in examinations where students’ ability to process material effectively is already compromised by test anxiety (Sarason, 1988). Students need to understand questions in the way intended in order to have a fair opportunity to display their knowledge and skills.

This research explored the effects of visual resources in a number of exam questions. 525 students, aged 16 years, sat an experimental science test under examination conditions. The test included six questions involving graphical or layout elements. For most of the questions, two versions were constructed in order to investigate the effects of changes to visual resources on processing and responses. Some of the students were interviewed after they had taken the test.

The analysis of the example questions in this study, along with others the authors have studied, suggest that two variables in particular play a decisive role in the effect of visual resources on the way examination questions are processed and answered. The first of these is the relative salience or prominence of the key elements. Secondly, the student must believe that the element is relevant to the answer. One factor in determining this is past test experience, which provides expectations regarding under what circumstances visual resources are relevant.

Constructing meaning from school mathematics texts: potential problems and the effect of teacher mediation

Crisp, V. (2005) British Educational Research Association (BERA) Annual Conference

The use of annotations in examination marking: opening a window into markers' minds

Crisp, V. and Johnson, M (2005) British Educational Research Association (BERA) Annual Conference

The PePCAA project: Formative scenario-based CAA in psychology for teachers

Crisp, V. and Ward, C. (2005) Ninth International Computer Assisted Assessment Conference, Loughborough University

2004

Could Comparative Judgements Of Script Quality Replace Traditional Marking And Improve The Validity Of Exam Questions?

Pollitt, A. and Crisp, V. (2004) British Educational Research Association (BERA) Annual Conference

2003

Can a picture ruin a thousand words? Physical aspects of the way exam questions are laid out and the impact of changing them.

Crisp, V. and Sweiry, E. (2003) British Educational Research Association (BERA) Annual Conference

2002

Tales of the Expected: The Influence of Students’ Expectations on Exam Validity

Sweiry, E., Crisp, V., Ahmed, A. and Pollitt, A. (2002) British Educational Research Association (BERA) Annual Conference

Our publications

A101: Introducing the Principles of Assessment

CPD accredited online courses

Global assessment community reaches one-year milestone

Victoria Crisp

Victoria Crisp

Publications

2024

2023

2022

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002