Matt discusses the disaster of grades in The Final Countdown
I just finished grading three problems worth of the final exam (the other two TAs are taking care of the rest), and I think the exam can be safely described as a debacle. It was a disaster. The scores haven’t been tallied up yet, but I think there’s a good chance the mean score will be within one standard deviation of zero. And, I dunno, about 4 or 5 standard deviations away from 100.
One mitigation technique I’ve mentioned before is to have a database of questions, so that you know the expected results, but this doesn’t necessarily work well in a university environment, and doesn’t apply here as it was already noted that the exam was made from scratch.
But there’s another option. I’ve taken and TA’d several classes where grades were neither curved, strictly speaking, nor were graded on a standard “90 and above is an A” basis. The professor who taught the class for which I TA’d pointed out that the students had a hard time adjusting to the concept that the average score was going to be about 50, since they had never encountered that system before. When it first happened to me as an undergraduate, the professor put it rather succinctly — why bother asking a lot of questions that everybody can answer? If 65 is the lowest passing score, then you’re asking a whole bunch of points worth of questions that don’t demonstrate adequate knowledge of the material. The idea was to cut out 50 points of that and add in questions that do require “passing” knowledge to answer and adjust the grading accordingly. The exams were much more complete in testing comprehension, since you could ask more questions about a particular topic. It’s not unlike the strategy taken during oral exams — asking questions until the target can’t answer them anymore. That’s when you’ve tested the depth of knowledge and comprehension.
I have a lot of sympathy towards this idea (in fact both classes I’ve TAed for used this method). However, I think there are a few arguments against it which bear consideration:
1. Discouragement. It’s very discouraging to take a test and after say to yourself “I got about 60% right, so my grade is somewhere between A+ and D-” Frequently the cutoffs aren’t determined until the end of the semester when final grades are being assigned, so people have no idea what their grade is for the entire semester.
2. Learning. Exams are imbued with huge importance compared to homework, so they stick in students’ minds better. I can still remember questions I got wrong and right on tests in classes where I can’t recall a single homework assignment. If you give the students a bunch of problems they can do, then they get more practice doing them. If you give them a bunch of questions they can’t, then they don’t even attempt most of them.
3. Pacing. Undergraduates are frequently quite bad at estimating how difficult a problem is, so tests designed with this framework may end up testing how well students decide “this problem is too hard, I should skip it” than how well they know the material. (Of course those are related, but not the same thing.)
On the other hand this method does mean that you’re much less likely to make a test too easy (which I think is probably worse than making it too hard since students can’t do better than getting all the questions right). It also means that all levels of students can be challenged by the exam, so the smartest ones still have to study.
I think that take-home exams alleviate some of these problems, but have issues all their own.
You know Arrow’s no-go theorem which says you can’t have a voting system which obeys three reasonable conditions? I bet something similar holds for grading: there will always be some combination of student performances which leads to a completely unacceptable result.