"How do you know it's working?" This is a key question for all educational practices to attempt to answer. Children's lives, as well as a good deal of time and money, are at stake. We need to know whether our teaching is being effective at producing the outcomes we espouse. This is not an irksome question dreamed up by remote bands of reductionistic bureaucrats, designed to stress out hard-working teachers and their students. It should be a burning issue for every school principal and teacher in every school. However, finding and collecting forms of evidence that actually get at what you value, whilst having no unintended negative side-effects, is harder than you think. Here are some pointers as to how to get the data you need whilst doing no harm to anything else.
- The question applies not just to innovations but to the status quo. Good old-fashioned chalk'n'talk is as much in need of evidence as any new-fangled idea. Does 'rote learning' work? For which students, and what subjects? Are students' Ethical Understanding improving? Both are fair game. Assessment should not be used as a stick with which to beat any innovation you don't immediately like the sound of (although it often is).
- We always have to bear in mind that the tail of assessment inevitably wags the dog of learning. High-stakes tests, by which schools, teachers and individual students are judged, and on the results of which much will hang, are bound to influence what and how teachers teach and students learn. Good evidence should not only indicate whether what we are doing is working, but should also have the effect of driving the kind of teaching and learning that we want to be happening in schools. Many crude 'measures' are not so benign. It is perfectly possible to design performance data for the growth of General Capabilities (such as performance on pencil and paper tests of Intercultural Understanding) that do not correlate at all with how well teenagers actually treat new arrivals in Australia.
- A prime example of (2) is Goodhart's Law, which states that 'when a measure becomes a target, it ceases to be a good measure'. If you judge a museum by the number of visitors, it will start counting the couriers and maintenance people. When the number of operations becomes a target, hospitals are tempted to opt for easier and faster operations. If you judge a school by its examination results, don't be surprised if principals find ways to game the system. There are many ways of technically achieving targets that have negative effects on the spirit of the enterprise. Many assessment systems have been produced by people who seem to lack a basic understanding of human nature. Data scientist Roman Shraga says, 'You should strive to create the best possible measures that look at performance from multiple angles while always maintaining scepticism and inquiry' (Shraga 2014).
- You can't decide what 'works' until you know the purpose of the evidence. Who do you want to be informed or convinced by the data, and to do what as a result? Is your evidence designed to appease national officials who are disdainful of anything that doesn't have a number (however bogus) attached to it? Are you trying to rack up your school's NAPLAN scores (and to hell with the General Capabilities)? Are you trying to stimulate formative conversations with students themselves about the growth of their own capabilities? A good measure for any one of these goals is very unlikely to be a good performance indicator for the others. Some people wrongly think that the kind of teaching that raises NAPLAN performance is essentially different from the kind of teaching that develops the General Capabilities. In other words, there is no 'what works' that can hit both targets at the same time.
- The words you use to talk about evidencing 'what works' matter a good deal. Assessing? Measuring? Evidencing? Tracking? Illustrating? Monitoring? They all carry different connotations, send you off in different directions, and may import different unintended consequences. 'Assessing' has a very summative, evaluative tone about it, and may steer you towards data that has the appearance (but maybe only that) of objectivity and infallibility. 'Measuring' commits you to quantitative data, and thus ignore perfectly good qualitative sources of evidence that in other spheres – the business world for example – are routinely treated as entirely valid and appropriate contributions to an 'annual appraisal'. Both 'assessing' and 'measuring' can make it seem that sending kids out from school with a label round their necks like 'Level 4 in Critical and Creative Thinking' is a sane and legitimate thing to do. Evidencing – the most neutral word – should include any or all of computerised data on resilience, teacher judgements, student self-evaluations, e-portfolios, 360 degree reports from a student's footie coach, mum and best friend, and a dozen other smart ideas that help to home in on what it is you really want to know.
Follow these five guidelines, and whether you are a little primary school in the outback, or a government department, they should see you right.