In the previous blog we looked at whether more online assessment could make the high-stakes GCSE and A Level exam system in England more robust. In this blog we consider whether more continuous assessment could do so.
Current GCSEs and A Levels in their current format are described as ‘linear’. This means that most or all of the assessment occurs at the end of the course in a single examination session, where candidates sit (usually) two or three exam papers (components) in each subject. In England, the main exam session is in May and June. A problem with the linear system is that events like lockdowns that prevent large numbers of students being in the same place at the same time have a particularly bad effect. In contrast, in a ‘modular’ system, the learning and assessment are packaged into units. Before the latest reforms in England (i.e. from 2000 to 2016), A Levels and GCSEs were modular, with sessions in January and June. The assessment of most courses consisted of four to six units. Subject to various restrictions, candidates could sit (and re-sit) different combinations of units in different sessions throughout the course. A Uniform Mark Scale was used to allow results from units taken in different sessions to be combined (aggregated) to give an overall qualification outcome.
A more radical way of making the assessment more continuous would be to rely more on teacher assessment.
At first sight it seems obvious that a modular system should be more robust than a linear one, in that it diminishes the impact of disruption to any particular exam session. Assuming some sessions can proceed normally then most candidates should have some externally assessed evidence (the unit grades) to form a basis for any judgements about missed assessments. However, in situations like the current pandemic, when lockdown restrictions have been applied and then eased and then applied again, it is debatable whether it would have done anything other than increase the problems unmanageably. In terms of valid assessment, both systems have strengths and weaknesses. It is worth noting that one of the reasons for returning to a linear system was to reduce the ‘assessment burden’ for students and schools, thus freeing up more time for teaching and learning.
The more continuous the assessment, the more robust the system would be to disruptions preventing lots of students being in the same place at the same time.
A more radical way of making the assessment more continuous would be to rely more on teacher assessment. Teacher or school assessed grades, based on a range of evidence, have now been announced as the means by which GCSEs and A Levels will be graded in England and some other countries around the world in 2021. Aspects of the proposals have led more vocal commentators to predict a ‘car crash’, but if it confounds their expectations and goes well there is likely to be a debate about whether teacher assessment should play more of a role in future. There is a wide range of different possibilities by which the role of teacher assessment could be increased. They can be categorised along four different dimensions:
- The ‘continuousness’ of assessment – how often teachers collect evidence that counts towards the final assessment: from low (exams in school at the end of the course) to high (every piece of classwork and homework marked/graded and aggregated into the final grade).
- The proportion of teacher assessment in the final grade, from low to high.
- The role of the exam board, from low (collect entries, collect grades, print certificates) to high (provide questions and tasks, technology platforms, training for teachers, external moderation, quality assurance and monitoring, standard setting and maintaining processes, dealing with appeals).
- The ‘agency’ of students – how much control they have over what evidence is used as the basis of their assessment: from low (all decided by the teacher) to high (all decided by the student).
The more continuous the assessment, the more robust the system would be to disruptions preventing lots of students being in the same place at the same time. The main drawbacks would be in ensuring a high quality of teacher assessment (consistently applied across schools), maintaining standards, and commanding public confidence of fairness. Not insignificant challenges! Increasing the involvement of exam boards may help mitigate some of these concerns but without any external assessment at all their hands would be somewhat tied.
One rarely-mentioned advantage of external exams is that the teacher and the student are clearly on the same side, like a coach training an athlete for a major competition.
Systems with a high proportion of teacher assessment in the final grade could change the teacher-student relationship, potentially in a bad way. One rarely-mentioned advantage of external exams is that the teacher and the student are clearly on the same side, like a coach training an athlete for a major competition. Pursuing the analogy, if the competition were cancelled it is doubtful whether coaches’ estimates of their athletes’ performance would command much public acceptance as a basis for determining places in the competition. If the teacher has the power to determine their students’ futures it puts them in a potentially compromising position, opening up possibilities for bribery, intimidation or favouritism. If this were combined with a highly continuous system where evidence from every piece of homework or class test could count towards the teacher’s judgement it might also create a somewhat oppressive ‘always on’ assessment environment which could mitigate against the creativity, exploration and learning from failure that achievement generally requires. Allowing the students some element of control over what counts towards their final assessment could help address this.
More teacher assessment might increase the utility of the grades for predicting future achievement by including (either explicitly or implicitly) qualities and attributes that are hard to assess in exams. In the USA, high school grade point average (HSGPA) is often used for college (university) admission alongside performance on standardised tests such as SAT or ACT. A study at the University of California found that HSGPA was both a better predictor of college performance than the SAT, and relatively more likely to select students from worse socioeconomic categories. Of course, this might not be true of comparisons between teacher assessed grades and A Level or GCSE exams because the latter are very different from the SAT. A recent study of applicants to medical school found that actual A Level grades correlated better with undergraduate (and postgraduate) outcomes than their school’s predicted grades.
Systems that involve elements of both external assessment and teacher assessment could therefore seem to be a way to spread the risk from a disruptive event and thereby increase robustness, although potentially invoking issues of unfairness and high workload.
Systems that involve elements of both external assessment and teacher assessment could therefore seem to be a way to spread the risk from a disruptive event and thereby increase robustness, although potentially invoking issues of unfairness and high workload. Several high performing jurisdictions from around the world already combine external assessment with teacher assessment, and in some of them the external assessment is used to moderate (standardise) the teacher assessment. Williamson (2016) discusses statistical moderation of teacher assessment using examples from different Australian states, Hong Kong and South Africa. She notes that validity and transparency often need to be traded off to some extent to achieve public acceptability.
Cambridge Assessment and Cambridge University Press have recently developed a set of outline principles that we believe will help all those interested in the future of teaching, learning and assessment. Following up the outcomes of the 2020 and 2021 cohorts who were unable to take exams will give us some more insight into the validity of teacher assessed grades for predicting future performance, and will provide a valuable contribution to this debate.
Read part one of this blog series