In this blog, Senior Professional Development Manager James Beadle reflects on why it currently feels like we are experiencing a feeling of ‘steam engine time’ with changes to the assessment landscape, and what professionals working in assessment need to keep in mind to steer our way ahead.
Read time: 8 minutes
Why is there so much change happening now?
The assessment landscape and broader educational systems are entering a significant period of potential change. It often feels that AI is at the forefront of these, from the use of natural language models to facilitate marking in language exams to ongoing debates about how, if at all, students should be using it within assessments. But other developments are also taking place. In the UK, exams regulator Ofqual have recently launched a consultation on the introduction of onscreen assessments for GCSEs and A Levels in England, with the first digital examinations potentially taking place in 2030. In the USA, the SAT exam, used for college admissions, has been digital since March 2024.
At a broader level, groups like Rethinking Assessment are increasingly advocating for significant reform of assessment systems, with transnational organisations such as PISA now testing areas such as creative thinking and readiness for life-long learning, as well as the more ‘traditional’ areas of mathematics, reading and science.
Many of these developments are not novel. In the late 1980’s, Cambridge Assessment was exploring the use of computer-based assessment for vocational qualifications. In 1966, Ellis Page published ‘The imminence of … Grading Essays by Computer’, making the argument for automated grading of language assessment. At a similar time, Ellis Torrance introduced a set of assessments looking at measuring creativity within elementary-aged students in Minnesota. So why does it feel that we are now experiencing an avalanche of changes?
Lessons from steam engine time
Charles Fort coined the term ‘steam engine time’ to describe the similar onslaught of changes seen in the 18th century. Despite earlier iterations, it was not until 1712 that Thomas Newcomen developed the first practical steam-engine. He was followed in 1766 by James Watt, who introduced a wave of improvements to engine design, heralding a period of rapid development and the first industrial revolution – bringing in ‘steam engine time’.
So why did this not occur over a thousand years earlier? The short answer is that the steam engine was not an isolated invention. Watt’s improved engines were depending on having finely designed metal cylinders, something that was only possible using John Wilkinson’s boring machine, invented in 1774. New techniques in areas like crop rotation led to the agricultural revolution in the early 1700’s, increasing farm output and freeing up labour, facilitating the industrial revolution, and the demand for coal and the corresponding steam engines required to pump out mine shafts. It was not one single innovation, but rather multiple factors coming together in a relatively short space of time.
The development of generative AI followed a similar pattern; in the last decade there's been a convergence of factors like 'transformer architecture' (enabling models to process entire language sequences simultaneously, significantly improving training times and the complexity of the data they deal with), the availability of large data sets to train models on, and the increased availability of cloud computing, allowing them to be hosted remotely on customised servers. It was the combination of these that led to ChatGPT ‘suddenly appearing’ on the scene in November 2022.
Likewise, a large number of separate developments are combining to foster change in the assessment landscape.
Combined with the use of AI for marking, this potentially allows assessments to be sat ‘on-demand’ with results then issued near-instantaneously. With universities and employers increasingly looking for candidates to demonstrate skills such as self-regulation, critical thinking and collaboration, this likely necessitates new assessments that can accurately report on these.
Whilst digital assessment is nothing new, developments in computing and mobile phone technologies mean candidates are increasingly doing on-screen assessments on their own devices at home, rather than in testing centres."
Finally, in a world where it feels like the pace of change is accelerating, rather than slowing down, many of us will likely need to reskill and certify at various points of our lives, creating increased demand for assessments more broadly.
Change comes with corresponding challenges. Whilst digital assessments can be more accessible for some candidates, they can raise additional barriers for others.
What does it mean for an AI marker to be reliable – is it simply a measure of how often it agrees with an expert human rater? Or do we also need to explore why it might sometimes disagree, and if particular groups of candidates might be affected? Can we allow students to use AI in our assessments without undermining their fundamental integrity?
What should assessment professionals do next?
So what should the next steps be for anyone who works in the field of assessment? In this time of change, there are two important things to consider:
- What questions do you need to be asking?
- What, ultimately, are you trying to measure?
What questions do you need to be asking?
Corbin et al (2025) pose that AI is a ‘wicked problem’ within assessment. For those unfamiliar with the term, ‘wicked problems’ are those that are difficult or impossible to solve for a range of reasons, such as incomplete information or changing requirements.
One of the characteristics of wicked problems is that the way they are described determines their possible solutions. If we view the problem as ‘students will use AI to commit malpractice’, then solutions likely focus on exam security, whilst if we view the problem as ‘how do we incorporate authentic use of AI into our qualifications’, then solutions may instead look at assessment redesign.
This isn’t necessarily unique to AI: the question ‘how do we adapt our current assessments so that they work onscreen’ is likely to lead to a different set of answers than the question ‘What opportunities do digital tools provide us to better assess the subject of concern’. The questions we ask ultimately help shape how we view this period of change: is it bringing new problems, or new opportunities?"
So how do you know what questions you should be asking? This leads to the second area of consideration:
What, ultimately, are you trying to measure?
Ultimately, the purpose of most high-stakes assessments is to carry out some form of measurement. The most important principle within assessment is that of validity: how well do our assessments measure what they claim to measure?
Dawson et al (2024) exemplify this approach with their paper ‘Validity matters more than cheating’, which highlights that by focusing on preventing malpractice, we can sometimes run the risk of undermining the overall validity of our assessments, by, for example, not accurately reflecting the construct or subject of concern.
To establish the validity of an assessment, it is first necessary to have a firm understanding of what it is you are trying to measure. This can be complex; subjects change, and the skills required, for example, to be an engineer now are not necessarily the same skills that were required thirty years ago. Other areas can be difficult to define; what exactly is self-regulation, or what does good teamwork look like?
But by having a firm comprehension of the area we are trying to assess, and an understanding of key assessment principles, we give ourselves the necessary foundation to not only begin exploring other questions regarding our assessments, but to also know what questions we should be asking.
If you’re interested in learning more about assessment principles and the purposes of assessment, you may be interested in our recently redeveloped course A101: Introducing the Principles of Assessment. The redevelopments include some new case studies and examples that reflect how the increasing use of AI interacts with assessments.