1. 6
  1.  

  2. 5

    I was TA for a course which involved Haskell programming, and we made heavy use of QuickCheck (similar to Hypothesis, but for Haskell rather than Python) for testing each student’s code. A couple of things to take into account:

    • If you’re going down this route, try to write your questions and tests at the same time. If you hand out the questions, give the students a week to write solutions, then try to come up with some tests to help you grade them, you’ll probably notice some annoyances which make testing harder (e.g. needing custom data generators).

    • Tests are great for checking if code does/doesn’t do the job, but that’s of releatively low importance in education.

    Regarding this second point, the main benefit we got from using automated tests in this way was that they freed up a lot of the (very limited) one-on-one time we had with each student. Rather than wasting time figuring out if the code works, we would run the tests to ‘get that out of the way’ in a few seconds, then have an in-depth discussion about some aspects of the code.

    If a test failed, we could talk about that: “why do you think it gave this output?”, “did you expect that part to work?”, “talk me through what you think this piece of code is doing”, etc. If all of the tests pass, we could pick some aspect of the code and ask about it.

    Many students seemed annoyed that they didn’t just pass/fail based on whether their code worked, but the discussions were more important. Some memorable moments:

    • One student’s “explanation” of their code was “that’s what someone put on Facebook”. I didn’t care if they got it from elsewhere, but since they couldn’t even begin to explain it I couldn’t give them any marks.
    • Another had used function composition in part of their code (written like foo . bar in Haskell), which we hadn’t covered. I spotted this and asked them what the dot does, and they didn’t know (they said they took it from Stack Overflow).
    • A few students wrote things like if foo > bar == True then ... and insisted it was “right” because it gave the correct answer. In these cases I didn’t give full marks since IMHO it indicated a lack of understanding, since == True is a useless no-op. If failing code can get partial credit for doing some things right, then passing code can get partial credit for doing some things wrong.
    1. 4

      Same here. We took it a step further though. We organized the faults we found in student code via property based testing into equivalence classes. This helped us give students feedback and sometimes informed grades.

      Here is the rubric we used (this does not contain weights, which varied): https://www.cs.tufts.edu/comp/105-2017s/coding-rubric.html — we probably wrote a couple hundred words of feedback to each student for each assignment. It was amazing.

      1. 3

        One thing a few CS classes did when I was an undergrad (admittedly at a kind of unusual university) was to let students submit to the autograder ahead of time. It was fully unattended and would email back in a few minutes with either “passed all tests” or something like “when we feed your program X, it outputs Y, which isn’t right”. You still didn’t necessarily get a perfect grade if you passed all the autograder tests, but it avoided students handing in code broken in trivial ways, at least if they were diligent enough to run the tests and fix the obvious bugs first (and if they weren’t, they could hardly complain about the grade).

        This was early 2000s, so the autograder was just a battery of hand-written tests accumulated over the years to catch the most common mistakes, not anything as fancy as QuickCheck, but it was pretty effective I think at, like you say, avoiding the feedback being entirely about trivial correctness issues.

        1. 2

          submit to the autograder ahead of time. It was fully unattended and would email back in a few minutes with either “passed all tests” or something like “when we feed your program X, it outputs Y, which isn’t right”.

          I think that would be nice to have for beginners in any language accompanied by some standard text and examples. It would definitely save time tracking down their early mistakes. They can try to figure it out themselves but don’t even really know the basics at that point. Better to help them there with their own debugging happening on stuff they should understand by that point.