Could technology render external assessment irrelevant?

18/02/2014John Ingram, Managing Director, RM Assessment & Data

“If I had asked people what they wanted, they would have said faster horses.”  So, reputedly, said Henry Ford on the topic of innovation. Regardless of the quote’s authenticity, it’s a useful reminder to step outside the norm from time to time and wonder what a bolt from the blue would do to our day-to-day existence.

Technology has already streamlined our assessment processes. According to Ofqual, onscreen marking is now the main type of marking for general qualifications in the UK. Onscreen marking involves scanning exam papers and digitally distributing them to examiners to mark using specialist software. In 2012 66% of nearly 16 million exam scripts were marked this way in England, Wales and Northern Ireland. Onscreen marking is also gaining in popularity in other territories: RM’s onscreen marking system has been used by awarding organisations in Eastern Europe, North America, Asia and Australasia.

As well as reducing the time and risk involved in transporting exam papers to and fro, onscreen marking improves reliability by automatically adding up the marks. Teams of examiners can be monitored in real time, with the system stopping under-performing markers from marking further questions.

On the whole, however, onscreen marking is just a smarter way of assessing hand-written exams. The fact that it can also be used to mark computer-based tests, coursework and audio-visual files is becoming less relevant in a country such as England where the emphasis is on linear assessment and paper-based exams, at least where school exams are concerned.

Let’s call onscreen marking of exams ‘faster horses’, then; it’s better than marking by hand but it doesn’t revolutionise the way we evaluate learning. So what’s the ‘motorcar’? Tests taken on computer? Countries such as Denmark and Norway have introduced computer-based testing for national exams. The next round of PISA tests in 2015 will be taken on computer. Moving from paper to computers does feel like progress – until you look around you.

The world has moved on to tablets, smartphones and – those clunky phrases – the ‘internet of things’ and ‘the internet of customers’. Which could mean that while we polish our current system to its highest possible sparkle, waiting in the wings is a disruptor which will render it irrelevant.

It’s perhaps natural that in education, where the stakes are so high, there can be fear of technology. There’s a worry that hi-tech can mean low quality – quicker, shorter, and more superficial assessment. But that needn’t be the case.

We’re already seeing glimmers of new ways of experiencing and demonstrating learning. Open badges add context to academic achievement. MOOCs offer access to expertise from all over the world. There will always be a place for face-to-face teaching and core subjects, but the way we learn is becoming broader, more granular, more accessible. With digitisation comes the expectation of immediacy: on-demand exams, instant results, instant certificates to share online.

For education to exploit technology for our children’s benefit, we need to learn from other fields. So far this year we’ve seen babygrows that monitor temperature and breathing. Contact lenses that measure glucose levels. Even toothbrushes that tell tales to your dentist when you’ve been less than thorough. It isn’t too much of a stretch to imagine multiple data streams which continually monitor a student’s development and trigger a feedback loop to help them gain the required level of attainment. Meaning a one-off, external exam is rendered unnecessary. Will it happen by 2025?  To answer that with any certainty I’d need to ditch my smartphone and dig out the crystal ball.

How can we improve reliability of assessment?


11/02/2014Alastair Pollitt, Principal Researcher, CamExam

I lost my faith in marking on 7th June, 1996, when – as a researcher recently arrived in England – I attended my first Marker Coordination Meeting. The point of this meeting was to make sure that all the markers working on one exam paper were interpreting the mark scheme in the same way, to make the marking “fair”. One of the Principal Examiners began his session by telling the markers, “Your job is to mark exactly as I would if I were marking the script. You are clones of me: it is not your job to think.”

What a chilling message. Is this how to encourage experienced and motivated professional teachers to carry on marking exam scripts? If I had been there as a marker I would have felt humiliated. School-teachers are highly educated and trained, and most of those present that day had many years of experience helping pupils develop their science ability. Their level of commitment to education was certainly higher than average (no one took on the task of marking just for the money!). Yet they were being told to stop thinking, to behave like mere automata. This cannot be the best way to use the experience and wisdom of the profession: there must be a better way.

The fundamental problem is the very notion of ‘marking’, which converts the proper process of judging how well a pupil has performed into the dubious process of counting how many things they got ‘right’. Is it even possible to assess the quality of a pupil’s science ability by counting? Are there not aspects of ‘being good at science’ that cannot be counted?

Not everything that can be counted counts, and not everything that counts can be counted. (William Bruce Cameron, 1963; often attributed to Einstein)

The simple truth is that marking reliability cannot be improved significantly, without destroying validity. Lord Bew recently reviewed the marking of National Curriculum tests for the Secretary of State, and concluded:

we feel that the criticism of the marking of writing is not principally caused by any faults in the current process, but is due to inevitable variations of interpreting the stated criteria of the mark scheme when judging a piece of writing composition. (pp 60-61)

This is true of most exams, not just of writing in English. In every question we ask markers to make a judgement: is this answer worth 0 or 1? Or 2? Or …? Trying to make these judgements reliable relentlessly drives assessment down the cul de sac of counting what can be counted, of identifying “objective” indicators of quality rather than judging quality itself. Referring to exactly this issue Donald Laming, a Cambridge psychologist, wrote:

There is no absolute judgement. All judgements are comparisons of one thing with another. (2004)

What can we do instead? Why not take Bew and Laming seriously? Stop marking: let the examiners make direct comparisons between two pieces of work; or let them rank several pieces. We have long known that teachers can rank order their pupils with high reliability and  high validity; when I began my career by creating commercial tests of reading and maths it was standard practice to report the correlations of the scores with teachers’ rankings as proof of validity. This is what it means to be an expert teacher: being able to make trustworthy judgements of how good two pupils are by comparing samples of their work.

Since most of our examiners are expert teachers, why not get them to behave like experts, instead of robots? Our exams will not only be more reliable, but more valid too.