Setting examinations that are fit for the future - Michael O'Sullivan

Setting examinations that are fit for the future - Michael O'Sullivan

Michael O'Sullivan delivered the following speech in Mandarin at the Education Supervision and Evaluation Seminar at Beijing University of Technology. 
 
Cambridge International Examinations is the only international examination body in the world which is still owned by a university. As the University of Cambridge we have a history of 800 years, and throughout that time the examination of students has been an integral part of the work of the University. But for the last 150 years we have also played a distinct role in providing examination systems in all subjects of study for use in schools all over the world.

...to help them provide students with an education fit for the 21st century."

Our work today serves over 10,000 schools in 160 countries, providing not just examinations and internationally recognised qualifications, but curriculum, learning materials, teacher training, and support for schools to help them provide students with an education fit for the 21st century. We are involved in providing examinations used in state school systems in several countries, such as Singapore. Our exams are also widely used in private schools offering international curriculum programmes all over the world. And in several countries, from Egypt to Kazakhstan, we are helping governments develop new curriculums and examinations for the 21st century.

Michael at speakers table China June 2015 - imageThis is a very topical time to be talking about examinations in Beijing. Next week, once again, millions of young Chinese face the once in a lifetime chance to take the gaokao . If they are successful, they will achieve their dreams and those of their parents by gaining admission to a top Chinese university. And let me remind you of another, more historical date. This year it is 110 years since the Qing Government abolished the Chinese Imperial Examination, which by then had operated for 1,300 years as the basis for meritocratic admission to the higher levels of the civil service in China. This was a model widely credited as the archetype of the modern public examination systems which nowadays feature in almost every society on earth.

With the perspective of history, we can reflect on the huge impact on civilization and society of high stakes examination systems. The Imperial Examination system clearly played a profound positive role in the extension of a common literary and philosophical culture to the whole of the Chinese world. It fostered notions of social mobility, meritocracy, and the importance of diligent study. And yet it later came to be condemned in China for its lack of attention to science and technology, and so was in part blamed for national disaster.

The modern gaokao, in contrast, places great emphasis on scientific and mathematical knowledge. But as society changes, call for reform of the gaokao are heard ever more loudly. Many question whether it is fair enough. Does it test what students most need to know? Does it equip students with an international perspective? Not everyone would answer such questions in the affirmative.

What gets tested gets taught"

Specialists in education assessment sometimes like to claim that “what gets tested, gets taught”. It is in attractive idea, especially if you are trying to sell examinations. But of course it is wrong. If it were true, then it would be sufficient to make examinations more difficult, in order for students to learn more. Actually that is quite a common mistake made by education ministries in some countries.

A more accurate formulation of this idea, in my experience, is “what does not get tested usually does not get taught.” So we had better be careful in designing examinations but also aware that, in designing education, we cannot start from examinations.

At Cambridge we believe that school education is best conceptualised as a triangular relationship between curriculum content, pedagogy and assessment. All improvement and reform in education needs to deal with these three interdependent factors in a coherent way, and little can be achieved through changing one of the three factors on its own.

Michael shaking hands China June 2015 - imageThe heart of education is the content of the curriculum - the definition of what we want students to learn. This is typically expressed as a system of knowledge and skills: knowing core facts, being able to express relationships between facts, and being able to carry out processes such as calculation, analysis, criticism, interpretation. Lately there has been some pointless controversy about whether knowledge or skills are more important. In truth, all of the things in which we are interested in education are a complex combination of skills and knowledge. Reading is a skill, but it requires knowledge of the structure of language and the meaning of words. Scientific observation is a skill, and yet requires knowledge to look for the right things in the right way. We might as well debate whether air or water is more important to human survival as argue about whether knowledge or skills are more important in education.

The Cambridge curriculum for any subject is very explicit about what we call “learning objectives” - detailed descriptions of what students are expected to learn at each stage of the curriculum. Examinations are designed on the basis of “assessment objectives” linked to the learning objectives. In this way, we try to ensure that the examinations encourage, recognise and reward the required learning, rather than allowing the examination to become an end in itself.

An effect of proceeding in this way is that our examinations are quite difficult to mark. Because learning objectives in chemistry include, for example, the ability to carry out scientific experiments, some of the examination must be done in a laboratory and marked through observation. To take another example, in history the learning objectives at age 18 include historical interpretation, and so students must in part be assessed through essays which need to be marked by highly trained examiners. It is expensive and difficult to examine large numbers of students in this way, and appeals for changes of marks are not uncommon and need to be processed fairly. But we still use these methods widely because we believe that encouraging and rewarding useful learning is more important than cost or convenience in the design of examinations. I have to admit, however, that it might not be practical to apply such methods widely to the gaokao in China. The scale of the gaokao and the overwhelming importance of fairness seem to me to set practical limits to the design of the examination.

...the teacher knows the student's true ability..."

Let me also acknowledge a common criticism of examinations, which points out that some things which are hard to assess by examinations, such as students’ practical ability in science or their ability to research a topic in depth, are much easier for teachers to assess. This criticism usually also includes the point that students’ examination performance can vary on the day for unpredictable reasons, whereas the teacher knows the student’s true ability. For all these reasons, at Cambridge we agree that assessment by teachers can in many situations be more useful than assessment by examinations. However, for certain high-stakes purposes, for example, university entrance selection, or any assessment which is used to measure the performance of teachers as well as students, we do not think it is practical to build a reliable assessment system based on teachers’ assessment of students.

And finally a brief comment on two major current trends in education assessment and a reflection on what they might mean for the issues discussed above.

Firstly, the assessment of so-called “21st century skills”. At Cambridge we have researched all the literature we can find on 21st century skills and undertaken our own studies. While not all these skills are really 21st century, many of them, such as problem solving and collaboration, are certainly important for students’ success at university and beyond. They should therefore be encouraged, and one way to encourage them is to test them in examinations, as many of us are trying to do. However, care is needed. At this stage of research, there is no consensus as to what “problem solving”, even in the context of mathematics, actually means. And certainly there is no wide agreement on what exactly “collaboration” means. We should therefore proceed with caution in making claims about our ability to rank and score students for their problem-solving or collaboration skills. It is important to adopt an experimental approach. “Experimental” does not just mean trying things out first, it means trying hard to prove that a theory or method is wrong, and only reluctantly accepting after much experimentation that it may be largely right. Providers of educational examinations do not always behave in that way.

The second trend is the rapidly growing use of digital technologies in education assessment. This is on the whole a very positive trend, offering potential for better, faster, cheaper, more sensitive assessment of the students of the future. Computer adaptive testing and virtual realities, for example, both open new possibilities in educational assessment. It is not impossible to imagine a future of education and assessment which is 100% digital, an idea which we are already testing at Cambridge. It is not impossible to imagine a future of assessment which is 100% digital."



There is however some danger linked to digital assessment, if it is introduced in the wrong ways. One danger is that it offers the possibility of very frequent, low-cost, fast-marked assessment of students. In the USA, the ambitiously named “No Child Left Behind” movement turned into something of a crisis when it came to be implemented principally through increasingly frequent administration of low-cost standardised tests. Aptly described by the Chinese proverb ba miao zhu zhang , this way of proceeding proved to be distracting and demoralising and it brought little educational benefit. Children do not learn at the same speed, and there is no point in testing them too often, and certainly no point in frequently testing what is not taught. That is indeed an example of the “what gets tested, gets taught” fallacy.

Another risk if digital assessment is badly implemented is that important skills are not tested, because it is currently easier to test some things digitally than others. For example, high-quality machine-marking of long written answers is still in the early stages of development and in most contexts not yet good enough for high stakes examinations. We should beware of reducing the quality of assessment by rushing to use digital technologies that still require further research and development.

Ultimately, design of a good examination depends on what the examination will be used for, and this must be identified with precision and clarity. Sometimes the same examination is used for too many different purposes: for evaluating students, for making selection, for evaluating teachers and schools. Generally speaking, the simpler and clearer the purpose of the examination, the more likely that it can be designed well.

As for reforming the gaokao, I think that because the purpose of the gaokao is clear – university entrance selection in China – great improvement in design is in fact possible. However, I think “big bang” reform of gaokao might be a bad idea, for several reasons. Sudden comprehensive change would be hard for society to accept as fair. Technologies available to support examinations are gradually improving. Sudden massive change in the examination will also make it very difficult to determine whether student performance has improved or deteriorated. And innovations will need to be evaluated, and on the basis of evaluation they can always be further improved.

I would recommend the following strategies for consideration:
- Greater clarity of learning objectives underpinning the examination design
- Extensive research to establish predictive validity of the examination components in relation to academic performance at university
- Greater optionality, so that students can be examined in subjects they wish to study at university, not just in general subjects
- Gradual increase in digital assessment, to improve the sensitivity and efficiency of the assessment.

Michael O'Sullivan
Chief Executive, Cambridge International Examinations

Related materials

Research Matters

Research Matters 32 promo image

Research Matters is our free biannual publication which allows us to share our assessment research, in a range of fields, with the wider assessment community.