Thursday, July 30, 2015

Student Evaluations of Teaching .... Why?

I had a long chat with a co-worker the other night and it was a good chat. I'll leave her name out of this since it is not fair to tar her with my thoughts, particularly since she may not share some or even all of them. We were talking about Student Evaluations of Teaching (SETs), those course-end evaluations that students at universities complete. Many universities require them; some requirement that they be included in packages submitted for tenure or promotion. Mount Allison does not, although we are required to complete a self-report and be assessed by respective deans every two years and most people include SETs in their reports. There has been remarkable controversy lately over SETs that, in my view, is frankly misplaced. Why? That is what I want to explain in this email, but I'll tip my hand at the outset: mandatory SETs that are assessed by deans will not accomplish their goals. Hence, they become a hoop through which people jump rather than a meaningful part of work in the post-secondary environment. What is more, we have alternatives. People who are looking to meet certain goals can use these alternatives to a better end.  SETs are not a waste of time but if you are looking to (a) improve teaching, (b) provide higher accountability, (c) ensure a student voice in the evaluation/assessment of profs ... SETs will meet none of these objectives. Hence, promoting them is not a good idea when we can more profitably direct our attention to mechanisms that will work.

One of the things about SETs that surprises me is how the debate -- at Mount A at least -- has become frozen in time. It has, in short, not kept up with the scholarship of teaching and learning as it pertains to SETs. Once, when I first started out in this gig (let's say 15-20 years ago), there was a vigorous debate about the accuracy of SETs. Do the scores students accord to profs accurately reflect teaching ability; that is a competence and so can be used as a tool of evaluation. This debate is old and settled. Anyone who keeps debating it is, frankly, missing out on a lot of scholarship. The scholarship on SETs falls into three camps. There are ardent defenders of their accuracy. There are ardent rejectors of their accuracy. Both are small groups. In other words, just about no one takes one side or the other in an extreme form. Instead, most people fall into a third camp that we could call "yes ... but ...." The basic premise of this perspective -- which represents the mainstream of SETs assessment the following:


  • SETs are valuable tools that cannot be taken in isolation. They must be combined with other forms of assessment. In other words, to use them by themselves as your sole or even dominant teaching assessment tool would be wrong and would create inaccuracies. It would be like using a year-end poll instead of having an on-going democracy. 
  • SETs cannot be assessed outside of a chronological framework; that is: one needs a time span, ideally several years, for SETs to make sense. In other words, the trend is more important than any single individual number.  This point should not surprise us because it is a standard point of statistical analysis. Ask Michael Adams if you don't believe me. It is the pattern into which the numbers fit that is important.
  • SETs require interpretation. You cannot just look down a range of numbers, see that one prof has 4.1/5 on average in their evaluation (we use a standard 5 point scale at Mount A)  and determine that that person is better than someone who had a 3.9/5. Why? We have no longitudinal data, we don't know how many students each of these profs taught, we don't know their stage of careers, we don't know the courses, we don't know how they provided extra help, we don't know whether or not the students in their courses has prerequisites. In other words, all we have is a number outside of context. What if the first person had six students and taught only one class of advanced, motivated students who happened to collectively be a really nice and deferential bunch who assumed that the prof knew what they were doing? What if the other prof laboured for hours with hundreds of students in core courses that students hated taking but were required for their degrees, providing endless hours of extra help, marking, say, hundreds of papers (I'm not exaggerating. I mark hundreds -- seriously -- of papers each year.) Would we think that a situation where the first prof's numbers could have changed dramatically if one student had changed their SETs as fair and accurate? Don't believe me ... take some time and do the math yourself.  What is more, however, we believe that everything requires interpretation. Geographers interpret space; biologists interpret living things, historians interpret the past, sociologists ... society. After we teach this to our students ... why would we suddenly believe that it would not apply to our jobs? After we spend hours teaching students statistical analysis, political inquiry, quantitative methods .... why would we ditch it as if everything that we taught were unimportant? 
Put together, no one who has serious studied this issue says "yeah, go ahead, look at one number and that will tell you whether person X is good or bad at their job." My point is this: in having a debate about whether or not we should have SETs and whether or not deans should look at numbers to assess faculty, we are missing an opportunity to leave an old debate in the past and have a new conversation that serves all of us (students, faculty, administrators, the institution) better. To get to this conversation we need to ask this question: why have SETs? 

I've encountered two answers to this question, one of which I accept and one I reject. The one I reject is this: we assess everything so why not use this tool to assess teaching. I reject that argument for the following reasons:

  •  We don't actually assess everything. Most professions are certified, go through probation and then are approved to work in that profession. We do this with faculty; it is called tenure. After certification, though, we don't assess most professions regularly. We don't assess the ice cream we eat; the plumber who  comes to our door; the mechanic to which we take our car. The idea that we assess everything is, therefore, wrong.
  • Why would we use a method of assessment that might be inaccurate? Even if we agree that assessment is good and useful, we want that assessment to be accurate. Simply implementing a SETs-based assessment policy will, therefore, do nothing but say we have a policy implemented, the accuracy of which we cannot guarantee. Does that inspire your confidence? Imagine using that line for a different profession. We have imposed a method of assessment on our fire department but we cannot guarantee that it is actually accurate and so we actually tell you whether or not they'll put out the fire if you call. Does that instil faith in the fire department? What about a medical professional? Yes, we've assessed your doctor and they passed but we can't actually say whether or not this assessment is accurate so if you are sick you might or might not get the right advise. Hmmm ...
Sometimes I hear people say "we  have to do something." Yes, we do, but let's do something that meets our goals; not something that we do just to do because that is a waste of time and resources. 

The correct answer to that question is that we need to assess teaching to improve teaching. There is absolutely no other reason to do so. (Pause and think about that if you disagree with me and you'll actually see that that statement is accurate. If we are not assessing teaching to improve it, what other possible end could we have in mind?) Now I am on the same wavelength. And, now you can see why accuracy, change over time, and context are so important. Without these things, we run the risk of drawing wrong conclusions and making changes that would actually hamper good teaching as opposed to enhance it. For instance, a person who is getting better at teaching every year might be someone who is worthy of commendation even if their numbers are not quite as good as someone else's who has basically flatlined. A person who innovates -- say, with regard to technology in the classroom, experiential learning, mentoring -- might run the risk of a horrible failure but we might all be glad that she ran that risk because it helps us. Should we condemn her for that? 

What I would suggest is that as part of this new conversation that we consider our objectives and think about the range of ways we can meet them. We approach our own developments on an opt-in basis, knowing that there will be people who will not opt-in right away, potentially never. I never mind it if someone says to me "you know, Andrew, I'm going to see if this works before I sign on." So, if someone is skeptical and wants to wait until we have evidence of success ... well, heck, that strikes me as just plain good sense. 

We could begin by looking at what SETs are used for in the scholarship of teaching and learning as one first step. We might discover that there is a whole field of research and we don't need to reinvent the wheel. We might encourage team assessments, student open fora on teaching, open fora on conceptions of what constitutes good teaching, or even what constitutes a class, experiential learning (done right), different forms of extra help, encouraging faculty involvement in the scholarship of teaching and learning, encouraging faculty involvement in teaching conferences, developing new and flexible forms of teaching that meet student needs. In other words, there is a whole bunch of stuff that we can and we should do to improve teaching ... why not start doing it. SETs and some mode of thinking about those might be useful but there is so much more that we can do to encourage student voices, provide fora for them so they can be heard, and get encourage improvements in the quality of education. Yet, if we spend all our time discussing SETs and their accuracy and whether or not a dean should look at them ... well, none of the rest of this stuff -- from student voices to something else -- gets done. 

I might blog more on this in the future but one last thing ... there is a role for deans in this and I honestly don't understand why it is not being done right now. Why not encourage good teaching? Why not go to workshops and teaching conferences? Why not check in with faculty as the term goes along? One of the big problems with using SETs to evaluate teaching is this. Imagine, for instance, and for the sake of argument, that it works and it does help us separate a good teacher from a bad one. If that were true ... it does that only after the fact. In other words, and assuming the accuracy of SETs, we discover that Professor X (who got a bad evaluation) was not educating his students only after the course was done. By this time, the prof -- allowing a bad prof -- might have cost students their scholarship, or forced them to have to repeat courses, or even driven them from the subject because he was so bad. SETs, in other words, have a built in flaw: they are re-active. We need to be pro-active in developing good teaching. And, I might say, I am on board for anyone who wants to find ways of doing that. 

No comments:

Blue Jay Way II: A Real Gamble

I don't want to be mistaken for an old baseball fuddy-duddy. Last year I complained about analytics, but I did so as a fellow traveler. ...