Four thoughts about Student Evaluations (occasioned by sexism in RateMyProfessor.com)
This post is occasioned by Ben Schmidt’s wonderful tool for exploring gender differences on words used in student evaluations on RateMyProfessor.
1. RateMyProfessor is a huge, but awful data set. Why awful? First, I’d bet that well under 1% of the students who take a course rate the professor on RateMyProfessor. For example, I went to graduate school in the department of psychology at University of Virginia. Many of the professors there have taught many many students in their careers. I would guess that 90% of the department has taught at least 1000 students, with many in the several thousands. Even professors who have only been there for 5 years are likely to have taught large General Psychology classes of 200 or so, in addition to smaller seminars. The most rated professor has 90 ratings, with most having fewer than 50. My adviser, a dedicated and decorated professor, with numerous teaching awards, had 33 ratings. Second, those students that go on RateMyProfessor are not a random sample, but will tend to include more students who are upset about having their expectations dashed, or students who really loved the class. I’m sure there are a few students out there who are dedicated and fill out RateMyProfessor for all their professors, but I’d still imagine the sample of student evaluations on this site are horribly biased. Third, this isn’t all colleges, but a sample, likely weighted by which schools are heavy users of RateMyProfessors. Ben has a graph of the top 50 schools here.
2. Even given that caveat, I think it is still really interesting to explore, and to note the differences (and similarities) among disciplines, not just across genders. For example, look at “good” where the gap between male and female occurs in every discipline by about the same amount. But then look at “caring” where different fields are different. Health sciences, math and chemistry have more females noted as caring, so does philosophy. But look at political science and music. Huh. You can also see huge across the board differences with “brilliant.” Another interesting one where there is a near constant difference across all fields: “we.”
3. Words aren’t always what they seem. The word “genius” has a huge gender difference, highlighted in this New York Times piece. While the implication that men are perceived as smarter is hard to miss (and a common finding of this kind of research, from other areas of ratings as well), it makes me think of the specific circumstances of student evaluations. I tend to think of mentions of “genius” or the intelligence of a professor as negative indicators of teaching quality. “He understands many things I do not understand” is all well and good when you are hiring somebody, but not when that person is supposed to be teaching you. A student should be amazed by what they came to know (how much I learned) not how much a teacher already knew. Yes, it is possible that a student is amazed by the blinding intelligence of a teacher, and also amazed by how much they were taught, but I’d argue more often than not a student remarks upon the genius of their professor when the professor has demonstrated their own intelligence rather than make the student confident in their own ability to learn.
4. Student evaluations are mostly measures of student feelings, not of student learning. I’d argue that these are sometimes correlated, but not nearly as closely as we might like to think. RateMyProfessor leans in to this, by asking for three ratings: Helpfulness, Clarity and Easiness. While these are good indicators of student feelings, they can run counter to student learning. Both easiness and clarity can indicate that a course felt smooth, with little cognitive or emotional struggle. Some students value challenge and struggle, and recognize that they are necessary steps on the path to learning, but others do not, and if they don’t feel a resolution to their discomfort during the class, they will vent on the evaluations.
5. The big data approach is worthwhile and interesting, but so is the small data. I’d urge higher ed journalists to take some time to read a hundred or so random comments on RateMyProfessors.com. Many reflect more on the student than on the professor. Some kinds of comments are unfortunately quite common:
Hard if you do not read textbook
have to read the book thoroughly in order to get a decent grade
No attendance policy, which is helpful
Don’t sleep in class. He hates that.
But every now and then you can see that a student was challenged, but felt adequately supported and learned a lot. From my reading of RateMyProfessors (and of my own evaluations) this is not as common as most observers of this kind of data acknowledge.