Can and should we use different assessments for different purposes?

07/01/2014Professor Paul Newton, Professor of Educational Assessment, Institute of Education, University of London

Having agreed to post some thoughts in response to the question of whether we can and should use different assessments for the purposes of certificating students, school accountability and measuring system improvement, I turned to Andrew Hall’s opening blog for inspiration. Andrew is keen to encourage blue skies thinking about the future of educational assessment in England, and has invited us to start by considering “what a really great assessment system would look like” in a way that is “unbounded by the reality of how the system is today”. In an attempt to be constructively provocative, I decided to reflect upon the meaning of ‘blue skies’ thinking in this context.

Over the years, I’ve had plenty to say about the uses of educational assessments. I’ve warned that an assessment that is fit for one purpose may be substantially less fit for another and might be entirely unfit for others. I’ve explained that even a procedure intended specifically to measure system improvement could serve many different kinds of purpose, with each purpose implying quite different assessment design decisions. Presumably, then, blue skies thinking about the characteristics of a really great assessment system ought to conclude that it comprises multiple, discrete assessment procedures, each engineered to support a particular purpose. After all, a really great assessment system would be as fit as possible for each and every purpose; and maximum fitness across the range of different uses could only be guaranteed if the system incorporated a range of different assessment procedures.

Yet, if this is blue skies thinking about the future of educational assessment, then it is not for me. An inevitable risk of blue skies thinking is that we set our sights too high. A ‘really great’ system is probably too high an aspiration; a ‘good enough’ system is more realistic. When we aspire to a system that is good enough, we open our minds to trade-off, to the realistic appraisal of costs against benefits. Conversely, in the blue sky world, the temptation is to be overly simplistic and idealistic; for instance, to insist that an assessment system should do no harm. In the real world, we should be prepared to accept that any assessment system will inevitably do some harm; even though, on balance, its benefits ought significantly to outweigh its costs. Blue skies thinking tends, ironically, to be black and white. The real world is not like this. The real world is grey.

So I am an advocate of ‘grey skies’ thinking. Grey skies thinking welcomes messiness. It acknowledges that we struggle even to articulate our policy goals, let alone to agree upon them, or to agree how best to achieve them. Fundamental to grey skies thinking is not abstraction from the complexity of the real world, but immersion in it. It involves thinking through the potential consequences of alternative assessment approaches in as much detail as possible. It means attempting to anticipate potential ‘fault lines’ and to gauge their likely severity. It means attempting to identify a broad range of social and educational impacts from alternative assessment approaches and to gauge their likely prevalence. It means focusing public debate on the prioritisation of policy objectives: How important are the various decisions that need to be made on the basis of assessment results and, therefore, how much assessment inaccuracy are we prepared to tolerate? How serious are the various impacts associated with alternative assessment approaches and, therefore, how tolerant of them should we be? In other words, what are we prepared to compromise on, and what are we not prepared to compromise on? Grey skies thinking suggests that it may be more fruitful to start by considering the really calamitous rather than the really great.

So, returning to my brief, can and should we use different assessments for the purposes of certificating students, school accountability and measuring system improvement? As I mentioned earlier, one blue skies answer to this question is an emphatic ‘yes’ – which is to invoke the ‘maximum accuracy’ principle. But an equally legitimate blue skies answer is an emphatic ‘no’ – which is to invoke the ‘collect once, use more than once’ principle, as Ofsted recently put it. Both of these answers are overly simplistic. The grey skies answer is neither an emphatic ‘yes’ nor an emphatic ‘no’ because the real world is far more complicated and messy than that. To provide plausible answers to this question we need grey skies thinkers who are willing and able to grapple with the kind of comprehensive and typically uncomfortable cost-benefit analyses that are fundamental to good policy making.