Predicting student performance: can assessment data tell us about the future?

When we think of test scores and grades, we often think of them as the end-point of a process: they tell us how students performed in a test.

However, sometimes we can use the same data to tell us something about what might happen in the future. In this blog, Matthew Carroll, Senior Researcher, Cambridge University Press & Assessment, explores what assessment data can tell us about the future and the care that is required in interpreting these predictions.

How can early test scores predict future academic performance?

The fundamental idea is that we define a relationship between outcomes on two tests.

Let’s imagine a maths progress test that students take at age 14. The students receive a score across different areas of maths, and an overall score that indicates their current attainment level. Teachers and students interpret and use these results directly, such as using them to identify which areas students need more support in.

Two years later, the same students take their GCSEs, and their Maths grades are recorded. The progress test scores they achieved two years earlier are compared to their GCSE grades. It appears that students who got higher scores on the earlier tests achieved higher grades in their GCSE. If this relationship is strong enough, we can, subject to certain assumptions, use the earlier progress test scores to predict later performance in GCSEs.

The fundamental idea is that we define a relationship between outcomes on two tests, one taken earlier that we will predict from, and one taken later that we will predict to. To do this, we use a statistical method, like regression, to quantify the exact nature of the relationship.

What makes a good prediction from assessment data?

By combining data from different tests, and using our knowledge of statistics, we can get more from the data.

Once we’ve quantified the relationship, when presented with a given ‘predictor’ score – here, progress test scores – we can say what the likely outcome is – here, GCSE Maths grades. Of course, this wouldn’t make sense for the students that already took their GCSEs – we know what grades they got! But in the coming years, we could use new students’ progress test scores to get ‘predictions’ of their likely GCSE grade.

This means that by combining data from different tests, and using our knowledge of statistics, we can get more from the data. For example, maybe the grade predictions can help to motivate the students, or support them when thinking about future subject choices.

Although this seems like a fairly simple and intuitive process, it’s actually a bit more complicated – and we need to take care at each step.

First, we need to ensure that each test is of good quality in its own right, such as having appropriate reliability and validity. If our data is poor quality, our predictions will not be good.

Second, we must think carefully about the data:

Are the students we’re using to define the relationship representative of the group we want to make predictions about?
Are there reasons the relationship might change from year to year, like changing test difficulty?

We need to be confident that the relationship is not specific to the exact sample of data we used.

Third, should make sure we’re using statistical methods appropriately. Does the data meet the necessary assumptions of the methods, and have we evaluated the results properly? If the relationship is weak, or if the data doesn’t meet the requirements of the statistical method, we shouldn’t be making predictions!

How accurate are grade predictions and what should we be aware of?

We need to interpret the predictions carefully.

Once we’ve ensured our data and statistical methods are sound, we should be able to go ahead and make predictions. But even then, we need to interpret the predictions carefully.

If our relationship is based on a regression, our predicted values indicate the average outcome for a student with a given ‘predictor’ score. There will also be ‘uncertainty’ around the prediction – it will not just be a single value, but a range of possible values. This means that a student might achieve higher or lower than predicted and it would still be within the range of expectations.

We must also remember that our predictions are based on a previously observed relationship. Maybe a student substantially outperforms their predicted grade. This could be because it is still possible even if it is statistically unlikely. Equally, maybe the very act of giving a predicted grade changes student behaviour, so they study harder. Ultimately, it is just a statistical relationship!

We have seen how predictions can be made using assessment data, and how this can get more out of the data, but also how we must take care interpreting predictions. Understanding assessment data and statistical methods can be very useful when thinking about predictions. Importantly, understanding the way predictions were generated reminds us of something crucial: no matter how much data we have, the future is always uncertain.

If you want to learn more about assessment prediction you can take our online course which covers these topics and much more in the area of assessment data and statistics. Find out more.

Before you go... Did you find this article on X, LinkedIn or Facebook? Remember to go back and share it with your friends and colleagues!