School exams: have ‘standards’ really fallen?
We have now concluded our online discussion on examination standards and published a report and recommendations.
On 29 November 2010, an Early Day Motion ‘EDM 1099: Exam Standards’ was tabled by Ian Mearns MP. The Early Day Motion draws attention to the report’s recommendations and calls on the Government to act on these recommendations.
We hope that the recommendations will inform future policy. Thank you to everyone who contributed and engaged with the debate.
As reported in The Guardian, Daily Telegraph, Daily Mail and on BBC News Online Cambridge Assessment hosted an open and frank debate on 29 April 2010 in order to clarify public understanding of the different examination standard issues. Over 100 people including teachers, assessment experts, employers and journalists attended. The debate was streamed live and nearly 1000 people watched the proceedings online. Watch the exam standards debate here.
Tim Oates, Group Director of Assessment Research and Development, Cambridge Assessment, commented that exam boards bowing to political pressure to make GCSEs and A levels more “accessible” could be one reason for the increase in top grades.
Other panellists included: Professor Roger Murphy, Centre for Developing and Evaluating Lifelong Learning, University of Nottingham; John Bangs, Assistant Secretary of Education, National Union of Teachers; Professor Gordon Stobart, Institute of Education, University of London; and Anastasia de Waal, Director of Family and Education at Civitas.
As reported in the TES, The Daily Telegraph, Times Online, The Independent, Daily Mail and Independent on Sunday in the run up to the debate, examination standards – and the perception of them – are of principal concern to society and dominate many educational and media debates.
To kick off the discussion we asked several education experts to comment on a paper written by Tim Oates at Cambridge Assessment about standards in public examinations. Click on the below links to read Tim’s paper and their responses:
- Tim Oates, Group Director of Assessment Research and Development, Cambridge Assessment
- John Bangs, Assistant Secretary of Education, National Union of Teachers
- Professor Roger Murphy, School of Education, Centre for Developing and Evaluating Lifelong Learning, University of Nottingham
- Anastasia de Waal, Director of Family and Education at Civitas
You may also find the podcast of the discussion held at our Parliamentary Seminar useful.
Comments…
(comments in chronological order)
30 Comments
Tom Bramley
Posted February 24, 2010 at 10:05 am
I would suggest separating Tim Oates’s ‘standards of demand’ into ‘standards of demand’ and ‘grading standards’. A tiered GCSE illustrates the difference. The higher tier is at a higher standard of demand than the foundation tier, but since grade C is available on both tiers, the grading standard of the grade C boundary needs to be in some sense the same on both. The examinee needs to obtain more of the marks on the foundation tier than on the higher tier to obtain the ‘C’. The interesting question then arises of whether the same inferences are warranted about the knowledge and skills of a pupil with a grade C, regardless of which tier they entered. If there are topics and skills on the higher tier not tested and (hence?) not taught on the foundation tier, the same inferences might not be warranted. I think this issue underlies some of the current debate around ‘stretch and challenge’ at A-level. This is not simply a case of adding an extra boundary (A*) at the top end – there seems to be a desire to draw the inference that a pupil with an A* has not merely gained a lot of marks, but that they have demonstrated mastery of more advanced topics and skills. In other words, this is not analogous to creating an extra grade category at the top of the foundation tier (C*?!), but changing the standard of demand of the foundation tier to make it more like the higher tier.
I concur with Tim Oates, and not with Roger Murphy, on the question of whether evaluating public examinations is a ‘highly technical area’. Deep thinking and clear thinking are necessary to define and clarify terms (like ‘standards’ and ‘demands’), to decide what kind of standards should be set and maintained, and what kind of evidence is relevant for setting and maintaining them. Some of this analysis will certainly be highly technical, but it is vital in order that the ‘users’ of the results can use them appropriately.
Irenka Suto
Posted February 24, 2010 at 10:06 am
I’ve always been fascinated by the relative lack of public scrunity of standards (demand, content, performance etc.) within HE. Somehow, there seems to be more trust in university standards. Is this due to there being less political interference? I’d agree with Roger Murphy that we can’t precisely answer the question of how the degree standards of different universities compare. Given that employers frequently require job applicants to achieve a 2.1, regardless of where it was obtained, I’m surprised that there isn’t more general concern. Or is this lack of obsession and scrutiny actually healthier than the debate we’re having at the level of GCSE and A-level?
Diane Purkiss
Posted May 14, 2010 at 2:59 pm
Irenka Suto, universities constantly scrutinise each other’s standards through the external examining system. This is absolutely not a jolly club; external examiners will criticise even the most presitigious courses if they do not meet accepted standards. That said, the kind of work undertaken in the ancient universities is oftne significantly more – just more. english students at oxoford, for exampole, write 1.5 essays per week. Elsewhere the norm is 1.5 essays a term, or 2 per semester.
Gill Elliott
Posted February 24, 2010 at 11:51 am
Tim Oates alludes to changes in high-stakes assessments affecting the processes for setting and maintaining standards. Inevitably and necessarily, there is an evolution of curriculum content and assessment methods as society alters and as new technologies advance. However, the rate of change in recent years has been great and this leads to several issues. Firstly, when changes to assessment structures occur at great pace how, when evaluating supposed changes in standards, do we evaluate exactly what contextual factors have had an impact?
Furthermore, once these factors are taken into account, is it acceptable in the public eye to conclude that it is simply not possible to compare like with like? Secondly, is there a danger that many changes to assessment structures in relatively short periods will erode public confidence, which in turn will leads to pressure for more changes, and ultimately turn into a vicious circle? How can we create sufficient stability in the system to effectively measure changes in standards? Finally, we need to determine exactly how the purposes of high stakes assessments have changed, and address the question of what purpose, exactly, we want them to fulfil in future. This is an issue which is at the heart of the debate on standards.
Beth Black
Posted February 24, 2010 at 11:52 am
It’s the old adage – if you want to measure change, don’t change the measure.
There’s a whole debate to be had about the relative weighting of sources of evidence (statistics versus professional judgement of quality of candidate performance) in both standard maintaining and comparability activities. Which methods most effectively combine the two and give the best (most valid) answers/outcomes?
However, I think Tim Oates raises a very important and pertinent point regarding the current pressures (from Ofqual) to rely even more upon statistics in awarding and further squeeze out the role of judgement. I wonder whether this possibly appears to contradict Ofqual’s own Code of Practice (by which awarding bodies are bound): “Para 6.14 – Each boundary must be set using professional judgement. The judgement must reflect the quality of candidates’ work, informed by the relevant technical and statistical evidence”?
Roger Murphy
Posted February 24, 2010 at 6:44 pm
Tom Bramley has entered the debate, but I think slightly misunderstood my difference with one aspect of the position as set out by Tim Oates. As I understood Tim’s argument he was stating that UK public examinations should be seen as an exact science, and that highly technical work could be undertaken to achieve that ‘exact science’ position, where presumably there would never be any need for further debate about the meaning of annual GCSE/A-level results. I took Tim to be arguing that if we tried harder with our technical apparatus then the ‘problem’ would be overcome. This is the point whichI don’t agree with, as I believe that public examination results will always be approximate and involve variable elements, which cannot be entirely controlled through the application of scientific methods. My alternative approach is to agree that we should make UK public examinations as fair and meaningful as we can, but accept that grade comparisons (across subjects, years and Awarding Bodies) will always be approximate. This means that we need as a society to accept that the information which we get from public examinations is approximate and should not be treated as carrying more meaning than it actually does. So if you want to know something about a student, or a school, or the teaching of a subject at GCSE level, for example, examination results are worth looking at, as one piece of evidence, but they need to be interpreted with care as they are not highly precise and accurate scientific measurements of educational achievement.
Tom is quite correct in stating that highly technical work can be conducted on examination statistics for example. I have nothing against such work, but don’t let us be deluded into thinking that such work will turn UK public examination grades into something they can never become. Such grades are often useful approximate guides to some aspects of the achievement of individual and groups of students. However to understand much from even a single grade achieved by one individual, you actually need to know quite a lot about that individual, the teaching they have received, and how their performance in that exam relates to their performance in other educational settings.
Mark Simmons
Posted March 3, 2010 at 9:13 pm
I have a strong sense that the standards of teaching and learning (for transferable understanding) in my own subject are being eroded over time, and I think I understand parts of the mechanism which is causing this.
“Performativity” incentives in the shape of the league tables and associated target setting drive our highly profitable examining businesses to bring to market the most accessible and predictable (and re-sittable) examinations they can get away with. Teachers/schools will choose the most “teachable to” examinations to maximise their own performance outputs (grades). Teaching to the test is an increasingly exact science and is leading to GCSE maths students (and now a new generation of maths teachers coming through (I am a PGCE tutor) who have been taught to the test very well. Many believe that there are correct ways to approach problems in maths of particular types – their own teachers did not want to confuse them. Few have much experience of genuine investigation, problem solving or creativity in mathematics.
Perhaps the publishing of school league tables should be criminalised?
Malcolm Swan
Posted March 4, 2010 at 10:06 am
I get frustrated with discussions of ‘standards’. Examinations only sample content, and currently (in maths anyway) only assess perfomance on fragmented, prompted, automated routine skills. Standards on these skills get better over time, as teachers get better at predicting questions and teaching to the test, but standards on conceptual understanding and on non-routine problem solving of a less structured kind (that require chains of reasoning) are not improving. In addition, the competition between awarding bodies means that no one body wants to rock the boat. Just two pieces of evidence here:
“ …. the rising trends in attainment are not generally being matched by identifiable improvements in pupils’ understanding of mathematics or in the quality of teaching. Evidence suggests that strategies to improve test and examination performance, including ‘booster’ lessons, revision classes and extensive intervention, coupled with a heavy emphasis on ‘teaching to the test’, succeed in preparing pupils to gain the qualifications but are not equipping them well enough mathematically for their futures. It is of vital importance to shift from a narrow emphasis on disparate skills towards a focus on pupils’ mathematical understanding. “ (Mathematics: Understanding the score, Ofsted, 2008)
“Rising scores in secondary maths examinations grades in England over the past 30 years do not appear to stem from real increases in mathematical understanding, a major new research study from King’s and the University of Durham has found. The analysis of 3,000 secondary pupils’ performance in algebra, ratio and decimals tests conducted last year suggests that there has been little overall change in maths attainment since 1976.” (Bera Press release, 2009)”
My point is that some discussion is needed as to the domain and grain size of a ‘standard’, before any discussion of rising or falling standards becomes meaningful.
Becky B
Posted March 24, 2010 at 3:36 pm
The Sykes Review – as featured in most of today’s press – doesn’t seem to have provided a lot of evidence. I wonder which set of standards they are talking about?
Jennie Roberts
Posted March 24, 2010 at 4:16 pm
I think the debate on standards (in terms of student attainment) is a very interesting and important issue – always has been, always will be! The examiners’ reports from 1858 show that school leavers demonstrated “little indication of an acquaintance with the best elementary mathematics works” and in the 1920s their “punctuation was almost universally deficient or valueless”. Ultimately, it is as important today – if not more so – that standards are maintained and students have the skills and knowledge that society demands. Looking forward to seeing how the debate addresses this matter.
Tim Oates
Posted March 26, 2010 at 9:20 am
In response to Roger Murphy and Tom Bramley, I can see the problem that Roger has with the notion of ‘exact science’. Sloppy drafting on my part, I am afraid. I can see Roger’s objection to the fact that this appears to fail to recognise (i) the necessary optimisation of measurement in real, operational settings (the inevitable compromises which arise – eg between precision and cost) and (ii) the existence of measurement error/imprecision. And I agree – it is essential that these are both acknowledged and managed. I think the problem was with my drafting; when I mentioned the need to assert ‘assessment as an exact science’ I meant not that each and every act of measurement should be exact (this can vary according to purpose etc), but that the ‘science’ of the conceptualisation, production, and evaluation of assessments should be ‘an exact science’. I believe that technical validation should have a premier place in assessment development – we should understand (with precision) just what each assessment is measuring and how it is performing as an assessment. As I mention above, this does not mean that each assessment needs to attain perfect precision (not least because this is typically unattainable) but that there needs to be great precision in the science which we bring to bear on each assessment, in order to understand how it is behaving (and thus, how fair and accurate it is, in relation to how fair and accurate it claims or needs to be). For example, I have big problems with APP (Assessing Pupil Progress) – no evaluation prior to substantial roll-out. When so much hangs on the National Curriculum level which each child attains, the classification error in APP is simply unknown – just how many pupils assessed via APP receive the wrong level? Could be fine, could be disastrous – does anyone know? This is where failing to understand the need for precise appraisal and evaluation seems to bite hard. Roger and Tom – what do you think…?
Roger Murphy
Posted April 6, 2010 at 5:51 pm
Thanks for your further response Tim. It would be good to nail where we all stand on these issues ahead of the live event on April 29th. I accept what you say about ‘sloppy drafting’, but wonder if there is more to discuss than that. As Malcolm Swan said in his post the problem with most debates about standards is that the individuals involved in the debate often have different definitions of ‘the standard’ which they are interested in. So if we take something like a traditionalist arguing with a moderniser about whether exam results in English and Mathematics show that educational standards are improving……..say in a context where the national profile of exam results has got better over a period of say 20 years. They will never agree because essentially the traditionalist will probably favour the style of exams that were set 20 years before, and place much greater store by the validity of those results, and quite likely disagree with the moderniser over the issue of whether the new style of exams is in any way an improvement on the previous style from 20 years before.
So where does that leave the ‘exact science’ argument? I suppose you could say that in order to conduct a scientific comparison of educational standards over a 20 year period then you need a clear cut comparison of like with like – say two similar groups of students taking similar exams but 20 years apart. As we know that kind of comparison means nothing if the curriculum has changed, and most of us involved in education believe that the curriculum does need to change to reflect changes in knowledge, changes in society, and indeed advances in our understanding of how to improve student learning.
So I have nothing against scientific approaches to educational assessment. If we are going to conduct ‘high stakes’ educational assessments then we really should make a big effort to design them well, mark the outcomes fairly, and publish the results accurately and clearly. However when people then want to make comparisons between results in different subjects and different years, for example, then we need to be aware of the limitations of our scientific assessment approaches. In such situations the exact standard being assessed in these different subjects or in different years were almost certainly not the same, and should not therefore be compared as though they are.
In summary then I have no problem in encouraging people to carry out specific educational assessments carefully and systematically – or people calling that a scientific approach. However when we are debating ‘School Exams: Have standards really fallen?’, we need to be careful to realise that the topic is complex and not all that amenable to an ‘exact science’ approach. School exam results only give us a very approximate view of the state of educational standards, whatever definition we care to use. That isn’t solely because of ‘measurement errors’, although they do play a part. It is also because educational standards are themselves highly complex and difficult to define to the satisfaction of a wide audience. Educational standards certainly need to be defined differently in different curriculum areas, and even within a single curriculum area they need to change over time to reflect the way the world is changing, along with schools and teaching and learning. Finally we all need to have realistic expectations about what national examinations,and indeed national curriculum assessment results can or cannot tell us.
Jonathan Wells
Posted April 8, 2010 at 9:22 am
I don’t think we can say for definite that standards have declined, it looks like they have, it feels like they have, but it’s difficult to compare the increasing use of guided ways in which students work through GCSE these days.
What has changed is that schools have got extremely good at teaching students the exam strategy to take a D to a C. A standard comment I get when talking and presenting in schools is “I can’t teach something unless I know what the assessment looks like”.
Schools are so focussed on helping borderline students jump to the C grade that they learning takes second place to passing. The negative effect this has, is that employers (and I am one of those) see students with supposedly good passes in Maths and English who can’t knock 15% discount off a price without a calculator and can’t write a sentence that makes good sense and is spelt correctly.
So I think it’s a combination of the way in which students are led through an exam and the widespread use of simple strategies to win marks that makes exams very much easier to pass rather than the questions being easier.
I have a direct interest in Functional Skills as my company is developing resources for these new qualifications. They are getting a bad press amongst teachers because they assessments demand that students use problem solving to come up with solutions rather than “tick box” right or wrong answers. Students struggle because they can’t use methods to narrow down the multiple choice options and instead have to rely on their own knowledge to argue and justify a solution – just like real life.
Exam boards are desperate to water down these Functional Skills qualifications to make them easy to pass and indeed cheap to mark, I hope and prey that Ofqual stand firm and insist on keeping solution based problem solving scenarios as the basis for Functional Skills.
Indeed I would go further and have Ofqual insist that exam boards change the assessment styles and methods at least once a year so that teachers can not teach to the test and students have to really learn and master the skills.
HJ
Posted April 8, 2010 at 3:26 pm
It’s not clear whether standards have actually fallen, but it is clear that a lower standard is required for the same grade, due to grade inflation.
It’s possible that standards have risen but that this rise is exaggerated by grade inflation; that standards have stayed broadly the same but grade inflation wrongly indicates that they have risen; or that they have fallen, but grade inflation has been so severe that students appear to get better results anyway.
We also need to separate overall standards from standards in individual subjects. Students now tend to take more GCSEs and AS/A2 subjects than they once did. The syllabus content may thus be lower in each subject (it certainly is in physics and maths – my specialities), but the education is broader. This may or may not be a good thing.
Tom Bramley
Posted April 9, 2010 at 12:57 pm
Roger Murphy and Tim Oates’s recent posts have been very helpful and have shown that probably there is little difference in our views. I do wonder a bit about Roger’s use of the word ‘approximate’. To say that exam results give only a ‘very approximate view of the state of educational standards’ suggests to me that there there are other things that might give a more precise view. If there are, then what are they? I suspect that what is meant is that it is not possible to frame the question precisely enough for an answer to even be defined – for the reasons that Roger, Tim and many of the other post-ers have picked up on.
Roger Murphy
Posted April 12, 2010 at 9:01 am
Some more good posts and maybe we are managing to define the general areas where the difficulties and disagreements arise. I suspect my position probably aligns better with that of Malcolm, Jonathan and HJ than with the stance of Tom and Tim. I do think that examination results data can be potentially dangerous, because it is so susceptible to misinterpretation. It is in that sense that I think that examination grades always have the potential to lead us to inappropriate conclusions, and are therefore best regarded as being approximate. Most examinations require the person taking then to undertake a sample of tasks, and do not attempt to provide a comprehensive examination of the area under assessment. Give the candidate a different sample of tasks and almost certainly they will produce a different performance. Give them the same tasks on a different day and their performance may vary again. Give their responses to a range of different examiners to mark and again you will probably get different judgements about the quality of their performance. All of those things apply to most types of GCSE and A-level examinations. In my view they make the grades approximate rather than highly precise measures of educational achievement. However the fact that they are approximate doesn’t mean that there is an alternative way of getting a much more accurate set of information. Educational assessment is both difficult and pragmatic. The state of an individual’s learning of a subject is a complicated thing to capture and summarise – say in a single letter grade. Also we only want to spend a reasonable amount of time and resources on assessing learning, because we are generally more interested in supporting learning rather than assessing it. For these reasons we make pragmatic choices about the time we spend on assessment, and the types of assessment tasks we include in large scale public examinations, and generally the sample of assessment tasks used is limited. For these reasons there are very specific well defined questions which we can ask about educational achievement, which will only receive approximate answers. How good is my child at Mathematics? Is my child better at learning Mathematics than I was thirty years ago? Has the national standard of students achievement in mathematics improved over the last thirty years? Those are all interesting and clear questions to which we are only likely to get approximate answers from the limited public examination data, to which we have such ready access.
DAVID VINTER
Posted April 14, 2010 at 5:53 pm
It seems a fact to me that pupils don’t do anything like the amount of reading that I did 60 years ago. Having started local council school at age 3yrs and 2 months, I was taught to read by two 10 year old girls. I regarded it as fun–every new book you leaened more! Our limited supply of books at home came weekly from the local library, it was wartime no one had any money, but we sat around the eveniing fire eating toast and reading. I could easily handle H.G. Wells at age 9.
We were not however consumed by TV and calculators—-we really knew our tables, as I still do!
Simon Lebus
Posted April 16, 2010 at 7:35 am
Roger is right. We do indeed need to have realistic expectations about what national examinations can tell us. I believe this question lies at the heart of the standards debate. When, as Group Chief Executive of Cambridge Assessment, I think about the work of our UK exam board OCR and our international exam board CIE, it is evident that both of them pay meticulous attention to the setting and maintaining standards using for the purpose a highly sophisticated combination of judgement and statistics. Our large Research Division similarly spends a great deal of its time examining awarding through statistical processes. And yet those who can be described as the users of qualifications – Higher Education and employers – continue to claim that standards have fallen. It is obvious therefore that there is a dissonance here and that something is not quite right.
Could it be that the standards debate has bifurcated, or split, and that there are now two sides talking about fundamentally different things, each believing that the other has misunderstood?
On one side we have teachers, educationalists and assessment professionals. They may occasionally argue amongst themselves on technical points (the 2008 GCSE Science spat is just such an example) but, in essence, they ‘know’ that the system is mechanically fine. On the other side there are the politicians and, more importantly, Higher Education, Royal and other professional Societies and some sections of some subject communities such as SCORE, who perceive that something is wrong at the macro level. There are two major reasons for this split and clues to what they are can be found in The Cambridge Approach, Cambridge Assessment’s statement of principles.
Firstly, the Approach says: “All assessments originated and operated by the Group, are underpinned by an integrated model of design, development, administration and evaluation. Fundamental to this is: 1) Clear statement of purpose…”
The purposes of the two main general qualifications, GCSE and GCE A Level, have become confused over the past 25 years. Originally designed in 1951 as a selection test for admission to HE A Level has had to adapt as the proportion of 18 year olds taking it has risen from 6.8 to 46.3%. A qualification designed for the top end of the ability range therefore has now to serve the needs of a much broader constituency. This is to nothing of the diversification and expansion in the number of subjects examined – from 32 to 69 – and an increasing focus on skills as much as knowledge.
GCSEs have suffered similarly, with the National Curriculum which it examines straying from its original purpose, now contriving to be over elaborate and simultaneously short of solid content. And as ever more students stay on in education or training (now a legal requirement) GCSEs have almost ceased to be an end of schooling examination and become certificates that validate one’s future specialist path.
Secondly, the Cambridge Approach states “Design processes for assessments from Cambridge Assessment must be predicated on: 2) Precision in the specification of the content of the assessment. 3) Consultation with all relevant parties.”
Since the creation of the National Curriculum in 1988, the British state has taken upon itself an ever-increasing role in mediating between subject communities, HE, professional societies, employers, teachers and examination designers in defining the content of syllabuses and their examinations. Although the National Curriculum extends only to 16 the long hand of central control continues thereafter in the form of highly prescriptive, regulator specified A Level qualification and subject criteria. And because the state or its nominated bodies are the mediators, they see their role as carefully balancing all parties, regardless of the purposes of the qualification.
The upshot of these two processes is that the ‘users’ of qualifications have been divorced from the producers, with access permitted only through the mediation of the state or its apparatus. The divorced producers have continued to carry out a difficult and arcane task with ever increasing accuracy but with little direct contact with users to help them re-balance that precision with some healthy macro overviews of the purpose of the exercise. It is therefore not surprising that we end up with two very different views as to what is happening in standards.
The only way to maintain standards is for the government to stand aside and let HE, employers and subject specialists talk directly to exam boards once again.
J A Sutherland
Posted April 20, 2010 at 12:48 pm
It’s just politics, not statistics. Politics resulted in the 50-year veiling of the Soviet, not Nazi, guilt over Katyn. Equally politics led Kruschev to assert that Russia would overtake the U.S. economically. I was trying to get a left-wing deputy Head to admit that the grade inflation was well under way —in 1993! The tilt to softer subjects had started then, too. Not more than about one-third of people take kindly to academe. “My” brilliant carpenter has written one letter in the last 30 years, and I would not choose a Wrangler to knock a nail in straight. “Humankind cannot bear very much reality”– Eliot.
John Bell
Posted April 22, 2010 at 2:16 pm
On June 7, 1951, some nervous sixth-formers completed the last paper of the first A-level maths exam produced by the University of Cambridge Local Examinations Syndicate (now part of Cambridge Assessment).
The examination contained questions on statistics. The candidates were asked to a correlation and comment on the result. To receive a pass mark on the paper candidates would have to obtain good answers on just three other questions on the paper.
About half a century later, candidates, many of them from under-funded comprehensives and having received “trendy” modern teaching methods, entered for the Midland Examining Group’s GCSE examination in statistics (also part of Cambridge Asessment). The candidates who were expected to get grades A-D encountered a similar question on the higher-tier paper. They were expected to plot the data, calculate the regression line, analyse the residuals, and comment on them. The GCSE question was one of 15 compulsory questions.
Clearly the GCSE candidates were expected to do more in less time and on a smaller part of the paper. Does this mean that GCSE statistics in 1993 is equivalent to A-level maths in 1951? Had these candidates reached the “gold standard” before starting A-level? No sensible person would answer yes to these questions. It is patently unreasonable.
The exams are not equivalent for several reasons. The GCSE candidates benefitted from technology in the form of calculators. Obviously in 1951, candidates would have had to calculate the correlation coefficient by hand and to get full marks on the paper would have had to do it correctly in less than 18 minutes.
Statistics was still a relatively new subject in 1951. Older maths teachers in those days would have completed their formal maths education before many common statistical techniques had come into general usage. Fisher’s classic text on experimental design had only been published sixteen years earlier.
There is also the question of the representativeness of the questions. The other A-level questions from 1951 would still be considered very demanding today. The range of maths covered was broad. The GCSE candidates followed a course that only covered statistics. It is doubtful that there are many GCSE candidates would, for example, be able to expand MacLaurin’s series. This question was only part of one 1951 A-level question.
If it is easy to reject the idea that finding similarities between old A-level questions and recent GCSE questions is evidence of an improvement in standards, what about the reverse situation? What should be concluded when a current A-level question is found on an old GCSE or equivalent exams? The press would argue that this is evidence of a failing examination system. However, the arguments about the structure of the examination, the breadth of the syllabus, the time spent on teaching the content of questions still apply. There is no reason why content found in a GCSE specification would not be found in an undergraduate course if it was not in a GCSE course taken by all the students prior to higher education.
The point is that examinations have to change. Priorities change. Technology changes. Knowledge changes. However, it is important that the process of change is controlled, documented and evaluated. This requires involvement of many of the stakeholders and careful deliberation and a commitment to evidence based educational policy.
As the sixthformers in 1951 would know “Tempora mutantur, nos et mutamur in illis”.
John Bell
Posted April 22, 2010 at 3:54 pm
For those of you not fortunate enough to have endured an education in the classics,
Tempora mutantur, nos et mutamur in illis” can be taken to mean
“Times change, and we too change with them”, or more precisely “The times are changed and we too are changed in them (or during them)” or since the verb means both “to move” and “to change”, “The times move [on], and we move [along] in them.”
catsick
Posted April 25, 2010 at 2:36 am
I don’t know why we don’t just switch to grading pupils in percentiles ,It easy to do and that way its very clear there can be no grade inflation, this makes it easy for everyone to evaluate the quality of a result in a fair way .
SRS
Posted April 26, 2010 at 11:50 am
In response to Catsick, examining boards could do this however is it fair to grade in that way if you know that 50% of the weakest cohort no longer takes the exam (or visa-versa). Even in the dark ages of the 1970s subjects that were taken by very strong cohorts (for example Latin) were graded to take into account the ability of the cohort.
If examination boards are unable to grade on the basis of some notion of fitness-for-purpose then the value of examining becomes questionable. Working out some means of ensuring that appropriate compromises between the varied purposes to which examiations are put can be made is the core of the issue and simply answers simply don’t deliver this.
George
Posted April 25, 2010 at 2:13 pm
In 1990 Business Studies was passable in 3-4 weeks:
http://www.osl-ltd.co.uk/pressimages/david&tony.gif
A few years later the teaching process was refined so it just takes 4 hours:
http://www.osl-ltd.co.uk/pressimages/alexhobbs.jpg
Mainstream schools still take 2 years and sneer at those who take less than a year. Despite appearances, in the private sector we don’t always aim at maximising profits. 4 hours tuition is a heck of a lot cheaper than 1-2 years in ‘establishment’ private sector schools – or even the resources pupils need to provide for state sector.
Does this mean standards have fallen?
In some subjects it was not possible for them to fall any further – it’s just been a case of the rest catching up!
(Economics (OCR) takes 2-3 weeks for the whole A2; Accountancy takes about the same time.)
Note: I teach Business Studies at the college listed in the website address.
James Boyd
Posted April 26, 2010 at 10:33 am
Within the existing institutional framework all of the problems highlighted above will continue. The fact that we have so called “public” examinations and national standards is part of the problem. As noted by Bastiat, monopoly tends to paralyse everything which it touches and qualifications is no exception. Until there is a variety of competing private branded qualifications available on the market then these largely pointless debates will continue. See my chapter on Qualification Inflation in The Broken University – What is seen and what is not seen in higher education. ASI, 2010.
Hilary Phelps
Posted April 29, 2010 at 5:25 pm
I think that Tim Oates was asking earlier today for evidence of change agendas possibly affecting standards. One problem that has arisen since the syllabus changes of 2008 in my subject, Psychology, is errors in new textbooks. Colleagues and I have found errors of various types in AS textbooks from two major publishers (in two different syllabi, incidentally) and in an A2 textbook from one of those publishers. While the syllabus changes in themselves have seemed healthy, the rushing out of poorly edited textbooks does seem to me to threaten standards especially in a relatively new A level subject like mine. If the standard textbooks are unreliable, how can exam papers be fairly yet rigorously marked?
Hilary Phelps
Posted May 3, 2010 at 10:05 am
Amidst all this, what is the impact on exam grade statistics of girls’ improved performance? My impression as a teacher is that they are less inhibited about fulfilling their academic potential than they were 35 years ago.
A point on the other hand is international views of our educational standards. I have anecdotal evidence that the reputation of A levels in Germany, for example, is not very high. Wouldn’t any move to a percentile approach risk further eroding the perceived value of UK qualifications abroad?
Dr Phil Budgell
Posted May 15, 2010 at 9:02 am
A purpose too far?
In 1996 I visited schools in Sweden. At that time I was interested in the power of ‘pupil referencing’ in determining the relative success of different subjects and individual teachers. At one point in a discussion with a school principal I suggested that we could not only compare the effectiveness of individual departments, but that we could compare the effectiveness of individual teachers within departments. He looked at me with total bewilderment and then, like Oscar Hamulka (playing the head of the Ministry for State Security [STASI]) speaking to Michael Caine (playing Harry Palmer) in ‘Funeral in Berlin’, he said, ‘You ask such strange questions English! Why would you want to do that?’
Why indeed? Examination results and test scores are merely social constructs. They can only approximate of what pupils know, understand and can do. At best they are moderated subjective judgements; they are not hard objective facts; they are only as they appear or as they are thought to be. One can choose one’s metaphor.
(a) From 1 Corinthians 13: the clarity of knowledge, understanding and attainment is obscure and ‘can only be seen through a glass darkly’. However, in examining and testing pupils, we are still children, who speak as children, understand as children and think as children: we have yet to become men and put away childish things. We continue to behave as if public examinations report hard objective facts.
(b) From Plato’s ‘Allegory of the Cave’ in The Republic: we are like prisoners in a cave looking at the back of the cave, watching shadows projected on the wall by things passing in front of a fire behind them. The shadows are as close as we get to seeing reality. Unlike Plato, in examining and testing pupils there is no escape from the cave and we have ‘no access to the true form of reality’. Rather we only have access to the mere shadows seen by the prisoners.
Dual Purpose Examinations
Traditionally the system of pupil examinations in England served two purposes:
• it accredited the stage of education that had just been completed and
• it functioned as an entry requirement for the next stage.
The General Certificate of Secondary Education still accredits 11-16 secondary education and functions as an entry requirement for post-16 education while General Certificate of Education ‘A’ levels accredit post-16 education and function as an education requirement for higher education. This has been true for so long that it is part of our ‘taken for granted’ world. But it was not always so and is still not the case in many other counties. In the United States, for example, your High School Diploma serves only to accredit your high school education. Entry to higher education is determined by your scores on the SAT Reasoning Tests (previously the Scholastic Aptitude Tests or Scholastic Assessment test). In some European countries, the High School Diploma similarly accredits only your high school education and each university sets its own entrance examinations.
This dual purpose is one reason why there has always been a tension in the examination system. The requirements of an examination that accredits one stage may be in conflict with the requirements of an examination that acts as an entry requirement for the next stage.
Some Statistical Considerations
All tests (Key Stage 2 SATs, GCSEs and GCE ‘A’ levels) face problems of reliability and validity. This was as true when only 20% of pupils sat for GCE ‘O’ levels and 5% sat for GCE ‘A’ levels as it is now. But then it didn’t seem to matter then – well, not to anybody but the individual pupil!
Expressed simply; the reliability of a test is a measure of its consistency while the validity of a test is a measure of its accuracy. But, of course, it is not that simple; even GCSE, for example, has to satisfy the requirements of different aspects of reliability:
• inter-examiner reliability refers to the extent to which different markers would award the same grades to a GCSE script;
• test-retest reliability refers to the extent to which a single candidate would, under the same conditions, get the same mark on the same examination paper on another occasion;
• inter-method reliability refers to the extent to which a single candidate would get the same grade using different methods of assessment, e.g. end-of-course examinations and continuous assessment and
• internal reliability refers to the extent to which different questions on an examination paper produce consistent marks.
There are, however, even more aspects of validity that apply, to a greater or lesser extent, to tests and examinations taken at school:
• construct validity refers to the extent (usually assessed by using statistical techniques) to which tests or examinations do actually measure the content or learning objectives of the syllabus;
• content validity (a non-statistical type of validity) refers to the extent to which the content of a test or examination matches the content or learning objectives of the syllabus;
• representation validity, also known as translation validity, refers to the extent to which abstract theoretical constructs in the content or learning objectives of the syllabus can be turned into tests or examinations taken at school;
• face validity refers to the extent to which a test or examination appears to measure a certain criterion; however, it does not guarantee that the test or examination does actually measure the content or learning objectives;
• criterion validity refers to the correlation between the test and examination and a criterion variable (or variables) taken as representative of the content or learning objectives. For example, it might compare compares the test or examination results with course work or teacher assessment;
• concurrent validity refers to the extent to which the test or examination correlates with other measures of the same construct that are measured at the same time. For example, modules or other papers taken at the same time;
• predictive validity refers to the extent to which a test or examination (e.g. Key Stage 2 SATs) can predict the results of a test or examination (e.g. GCSEs) taken at some time in the future;
• intentional validity refers to the extent to which the content and learning objectives of the syllabus and the test or examination adequately determine the level appropriate for an 18 year old school leaver or university entrant;
• external validity refers, for example, to the extent to which test or examination results can be held to be meaningful outside the context of the education system. In other words, what do they tell employers about prospective employees?
The English system of public examinations (excluding Standard Assessment Tests) with its dual purpose has a deservedly high reputation for the extent to which it manages the conflicting demands of reliability and validity – even, sometimes the conflicting demands of different types of reliability and different types of validity. But, in a statistical sense, there are and always have been errors: for example, when a pupil passes an examination when s/he should have failed or fails an examination when s/he should have passed. These two types of errors (with their various names) are illustrated in the table below.
The pupil did meet the requirements The pupil did not meet the requirements
The pupil passes the examination. OK Type I error
Type a error
Error of the first type
False positive
The pupil failed the examination Type II error
Type ß error
Error of the second type
False negative OK
Of course, once more, it is not that simple. Pupils do not just pass or fail examinations; they are awarded a wide range of grades. But the table can be adapted to deal with a more real situation.
The pupil did meet the requirements for a grade A The pupil did not meet the requirements for a grade A
The pupil is awarded a grade A. OK Type I error
The pupil is not awarded a grade A Type II error OK
The tables above characterise Type I and Type II errors. However, what is not immediately obvious is that strategies to minimize these errors are often contradictory: any attempt to minimize Type I errors will increase the likelihood of Type II errors and vice versa. For example, strategies to minimize the probability of a pupil not being awarded a Grade A when they should have been will increase the probability of a pupil being awarded a Grade A when they should not have been.
The table can also be adapted to deal with the different types of reliability and validity discussed above – but one example will suffice.
The test does have content validity The test does not have content validity
The test is assumed to have content validity OK Type I error
The test is assumed not to have content validity Type II error OK
The example has been chosen because many teachers of English pointed out each year that the Key Stage 3 SATs in English lacked content validity – they did not test the content and learning objectives of Key Stage 3 English Programmes of Study. They certainly manifested low inter-method reliability (a low correlation with teacher assessment) and almost certainly suffered from low relational or translational validity (the Programmes of Study were couched in terms that rendered it very difficult to turn them into test questions). In the final analysis the Key Stage 3 SATs in English exhibited nothing more than face validity.
The tension in the examination system that is a result of their dual purpose can be interpreted in terms of the conflicting demands of reliability and validity. For an examination that accredits a stage of education that has been completed; construct, content, representational, criterion and concurrent validity are particularly important. On the other hand, for examination that functions as an entry requirement for the next stage in a young person’s life; predictive, intentional and external validity that are more important.
The system in the United States, where the High School Diploma and the SAT Reasoning Test have distinct and separate purposes, illustrates this distinction. The High School Diploma meets minimal requirements; to graduate from high school you have to do little more than attend and meet minimal standards of literacy and numeracy. The High School Diploma certainly achieves face validity and it does allow the pupil to participate in the razzamatazz of the High School Graduation Ceremony – and, maybe, that is its main purpose. The SAT Reasoning Tests (more akin to an intelligence test) has only to meet the requirements of predictive, intentional and external validity. In meeting these requirements, it is essentially a single test and virtually content free; it is independent of syllabus and course requirements.
Contrast this with GCSE and GCE ‘A’ levels – particularly the latter. Not only do our public examinations have a dual purpose and have to meet the requirements of reliability and validity. They have to provide an element of choice; different and independent examining boards, different subjects and even different specifications within a subject. It is the requirement of choice within the system that provides the final challenge to concurrent validity:
• is AQA English Language the same standard as Edexcel English Language;
• is ‘A’ level Physics the same standard as ‘A’ level Media Studies or
• is AQA ‘A’ level Psychology (Specification A) the same standard as AQA ‘A’ level Psychology (Specification B)?
Interim Summary (well interim until the introduction of league tables)
The public examination system has the dual purpose of accrediting the stage of education that has been completed and functioning as an entry requirement for the next stage – when, in practice, these requirements may be contradictory.
The public examination system has to meet the requirements (in all their complexity) of reliability and validity – when, in practice, these requirements may be contradictory.
The public examination system has to minimize Type I and Type II errors – when, in practice, strategies to minimize the likelihood of one type of error will lead to an increase in the likelihood of the other.
And it has to meet all these requirements while, at the same time, providing an element of choice.
The debate so far has been conducted within the domain of applied social science and despite the having to balance contradictory requirements along seemingly orthogonal dimensions; the public examination system was fit for purpose, performed remarkable well and, as has been pointed out above, had a very good international reputation.
A purpose too far
It has been argued, by both Conservative and Labour Governments, that the introduction of National Curriculum Key Stage Standard Assessment Tests and school league tables was intended to:
• provide teachers with information that would enabled them to improve their pupils’ attainment and
• provide parents with the information that would enable them to make an informed choice about their children’s education .
It may, however unlikely it is, be just an unplanned or ‘unintended consequence’ of the introduction school league tables and the publication of key stage test and examination results, but the current debate about standards in education can only be interpreted and ‘made sense of’ within the domain of politics – rather than applied social science. The politicization of education and the plethora of data (much of it yet to attain the status of information) have led to education performance becoming part of a framework for the accountability for schools and the rhetoric of successful policy implementation.
Now school performance data has to meet the criteria of reliability and validity, or maybe just face validity, within the politics of education – but in doing so its reliability and validity within the domain of the applied social science of tests and examinations has been severely compromised. A number of examples will suffice.
Both Conservative and Labour Governments have relied on the percentage of pupils achieving ‘5 or more A*-C grades’ as their ‘gold standard’ in the measurement of the success of individual schools, local authorities and nationally and hence government policy. It may have face validity and facilitate easy ‘sound bites, but ‘5 or more A*-C grades’ is a fundamentally flawed measure. It suffers from a severe ‘threshold effect’. A school may go from:
• 50% of pupils achieving 5 C grades to 50% achieving 10 A* grades or
• 50% of pupils achieving 5 G grades to 100% achieving 10 D grades;
both enormous improvements; but neither would register on the ‘gold standard’. On the other hand, it is easy to construct a scenario in which 10% of the pupils increasing their grade in mathematics from a D to a C would lead to in significant improvement in the percentage of pupils achieving ‘5 or more A*-C grades’. The use of 5 or more A*-C grades has exacerbated the problem of ‘grade creep’ and the focus on the small number of pupils on the D/C boundary.
The pupil did meet the requirements for a grade C The pupil did not meet the requirements for a grade C
The pupil is awarded a grade C. OK Type I error
The pupil is not awarded a grade C Type II error OK
In terms of the table above, it would be really important for the school to ensure that Type II errors are minimised. However, as suggested earlier, this would increase the likelihood of Type I errors. But the pay-off matrix is clear. In the short term, there is a penalty for not minimizing Type II errors but no penalty to the school for increasing Type I errors – in fact it is in the school’s interest to do so!! This can be multiplied across the education system.
In the early years of the 1997 Labour Government, the National Strategies in Literacy and Numeracy were introduced. Within one year the percentage of pupils attaining three level 4s in the Key Stage SATs increased – in almost every school, in almost every local authority and therefore nationally – by 10%. Would that every national policy in education were so successful!! However, such a systematic improvement, achieved so quickly, raised major questions about both the reliability and validity of the improvement. Was this improvement real or just a function of a shift in grade boundaries? Any challenge to the legitimacy of the ‘improvement’ was met with a response within the domain of politics rather than that of applied social science. “Why do you insist on demeaning our teachers who have worked so hard to achieve this improvement?” It was the claiming of an artificial improvement in standards of attainment that was demeaning of teachers rather than the questioning of an artificial claim.
With the publication of secondary school league tables and the resulting illusion of successful and failing schools that was created; there was an oft repeated assertion that ‘it wasn’t a ‘level playing field’. In small towns in Gloucestershire, Leicestershire or North Yorkshire, comprehensive education could be implemented by establishing one school that took all children from the surrounding 5/6 miles. In large cities, implementing comprehensive education was deeply problematic. One neighbourhood ‘comprehensive’ school may serve an area in which up to 90% of parents had degrees or professional qualifications. Its nearest neighbourhood ‘comprehensive’ may serve an area suffering from severe social deprivation. The ability distribution and the standards of attainment of the pupils entering the school could be the mirror image of each other. The illusion of ‘successful’ schools and parental choice exacerbated the problem – the flow of children was in one direction only. Few ‘middle class’ parents chose to send their children to schools in disadvantaged areas, it was only aspirational working class parents (denied their traditional access to grammar schools) that tried to send their children to schools serving middle class areas.
In this context, it was not surprising that the Contextual Value Added measure was included in the School Performance Tables. Once more however, it was fundamentally flawed. The only measures of prior attainment that were nationally available were the Key Stage 2 Standard Assessment Tests – whichever aspect is adopted, these tests have the lowest reliability and validity of all. The attempt to introduce a value-added measure was discredited before the ink was dry – and the introduction of a ‘level playing field’ set back for the life of the next parliament!
Work done at the time indicted that, if good prior attainment data was available, up to 95% of the difference between schools could be accounted for by the pupils who attended the school. Expressed in a slightly different way, the same work indicated that the electoral ward that a pupil lived in was a bigger (much bigger) determiner of standards of attainment than the school that they attended. This brought an immediate challenge to the organisation of comprehensive education in urban environments. However, and despite significant work undertaken at the University of Durham and elsewhere, it was easier in a culture of accountability to use the failure of individual headteachers to account for ‘low standards of attainment’ rather than systemic failure. May be Gasworks Comprehensive has very ‘low standards of attainment’ but this is a function of the failure to implement comprehensive education in a way that doesn’t guarantee its ‘failure’. Maybe Gasworks Comprehensive should be closed; but not because of the failure of its headteacher – it should never have existed in the first place!!
Fit for Purpose
The politicization of education and the resulting conflation, by central government, of the role of purchaser and provider has led to multiple problems with the system of public examinations. In addition to the legitimate challenge of validity and reliability:
• across boards,
• across subjects and
• across specifications;
challenges that could be met with an acceptable measure of certainty; there is now an impossible challenge of reliability and validity across time and a demand for even greater reliability and validity across boards, subjects and specifications.
Changes in the headline measures of performance:
• the percentage of pupils achieving 5 or more A*-C grades at GCSE or
• the percentage of pupils achieving 3 or more A grades at GCE ‘A’ level;
are such that they are no longer perceived, certainly by the media, to have any face validity. Furthermore, the increase in the percentage of pupils achieving 3 or more A grades at GCE ‘A’ level is such that the Higher Education Sector, the prime end-users of GCE ‘A’ levels, no longer believe that they have any predictive validity.
Our system of public examinations has a limited band width within a short time window. The challenges raised earlier may be interesting. But, the politicians, and the media, are demanding that our system of public examinations address issues that is no more fit to answer than A Question of Sport is fit to answer questions about whether Wayne Rooney is better footballer than George Best or a better sportsman than Michael Federer!! However, it doesn’t matter too much if the panellists on A Question of Sport make a fool of themselves or get the answer wrong – the pay-off matrix has no serious rewards or punishments.
The politicization of education, however, actually distorts the system. To pursue one metaphor introduced earlier, the politicians are guilty of committing:
• Type IIIa errors – addressing the wrong issue;
• Type IIIb errors – addressing the right issue too late;
• Type IVa errors – providing bad answers to the right question
or even
• Type IVb errors – providing good answers to the wrong question.
More graphically even:
• the dark glass through which we look is now a distorting lens or
• our cave is now a hall of mirrors at a funfair!!
Sue Fishwick
Posted May 30, 2010 at 11:00 pm
No doubt the debate as to whether standards have fallen over time will rumble on for years to come. After watching the debate, I find the question almost irrelevant. Surely it is more important and productive to look forward; we need to ask how the fairness and relevance of assessment can be improved to meet the needs of current and future students. Someone needs to ask those awkward questions such as, “Why does an examination board deem it acceptable practice for a Principle Examiner to re-mark a paper from a college which he has already personally sampled?” Surely this involvement throughout the process jeopardises his impartiality. This question could appear a touch petty, but when considered with the fact that the whole cohort entry for this particular unit was marked significantly lower than the national average (4% Grade A) , and all other units were higher than the national figures, I feel it warrants an answer. Unfortunately this very scenario cost my daughter a place at Oxford. I’m afraid you’ll never convince me that the mechanics of the process are working!
Tim Oates
Posted June 2, 2010 at 9:44 am
As I said during the debate we have had a period of constant change in the structure and content of qualifications, and one of my concluding points was that if you effect continual or inappropriate and unnecessary change of qualifications, it makes holding any standard extremely difficult.
There have been many supportive emails and comments commending the position. So here’s the list as it currently stands of unnecessary changes and/or changes which threaten standards:
1. Calculators in and out of GCSE maths.
2. Modularisation across the whole system, without piloting, in 2000 (awards in 2002).
3. Wholesale modularisation in GCSE in 2008 (awards in 2010).
4. The drive towards reduction in the gross number of qualifications in the English system, plus the use of the concept of ‘coherence’ – overturned operationally by the decision to allow FE centres awarding powers.
5. Overblown content of the NC followed by extreme, inappropriate reductions (eg KS4 Science) – this impacts on ABs principally due to GCSE requirements to cover NC.
6. Core skills and functional skills as a hurdle.
7. Languages in and out of the curriculum requirement.
8. Shifts to and from coursework, and in the form of coursework; 2002 GCSE criteria from QCA determining that all GCSEs should include coursework, contributed to bringing coursework into disrepute by invalid/unpopular inclusion and increase in assessment volumes.
9. Different messages re equivalences and changes in equivalences.
10. Reduction from 6 units to 4 (again wholesale system change rather than per subject); AB proposals for 4-unit qualifications over-ridden in 2000.
11. Reduction in assessment time from 4 hours to 3 hours in 2002.
12. The pace of the accreditation cycle, leading to changes after one year of a specification – GW highlighting that change cycle now five years rather than ten.
13. The fight over the form of stretch and challenge and A*.
14. Constant change in the name, but not the form and content of, core skills, key skills, functional skills.
15. GNVQs, AVCEs, Diplomas. No stability in the vocational route. Constant academic drift in successive transformations of the vocational route.
16. In national assessment, mental maths.
17. Split and merger of Eng Lit and Eng Lang and current perverse performance table rules.
18. Unreasonable subject changes – eg http://www.tes.co.uk/article.aspx?storycode=337286
19. Maths tiering – access to grade C in tiered examinations.
20. Fluctuations in the assessment of SPG in GCSEs.
21. Vacillations on Diplomas=only qual versus Diplomas=distinctive route for vocational.
22. AS standards-rating fixed at 50% of AS+A2.
23. Decision by QCA to combine maths and further maths – subsequently overturned.
24. Detail in Languages – in February 2007 it was agreed that a pilot of Oral Language Modifiers (OLMs) would be held in summer 2007. There were three salient features; (a) the use of a sample of students, (b) it should establish whether the use of OLMs compromised the integrity of the assessment, (c) it would establish how a wide range of disabled students would operate with OLMs. Five weeks after this agreement letters were written changing all three features; (a) there would be no sampling but all deaf candidates would be eligible, (b) it would not inform the impact of OLMs on the integrity of the assessment but would simply inform good practice, (c) no students with disabilities other than deafness would be included. Awarding bodies thus found themselves committed to what was effectively a different project upon which they had not been consulted.