Graded Quizzes
Starting with v0.4.0, quizzes embedded in nbgrader Manually Graded Task
cells can be graded automatically. Student responses are recorded into a
responses.json sidecar file as the student answers, and nbgrader
autograde evaluates them against the instructor’s answer key with
partial-credit support.
The instructor only needs to:
Register
CreateQuizat the front of theGenerateAssignmentpreprocessor list (see nbgrader Pipeline).Write the quiz inside a Manually Graded Task cell — exactly the same authoring flow as the self-checking quiz in Quiz Syntax.
Because nbgrader requires every Manually Graded Task cell to carry a
grade_id, the preprocessor promotes the quiz to graded mode by
default: correctness feedback is hidden (so students can’t guess
their way through), each answer click is persisted to the sidecar, and
an autograder cell is auto-generated. No separate test cell needs
to be written by hand. To opt a quiz out on a per-quiz basis, use
graded=false in the quiz header (see Mixing graded and self-check
quizzes below).
How it works
At generate_assignment time, CreateQuiz emits one code cell
per quiz region. The cell’s visible source (in the release) is just
the usual display_quiz(...) call. The cell’s hidden-tests block —
stripped from the release by nbgrader’s ClearHiddenTests and
restored at autograde time by OverwriteCells — embeds the answer
key as a Python literal and invokes grade_quiz(). A bare
_result.score at the end of the cell feeds nbgrader’s partial-credit
machinery (utils.determine_grade). The task cell’s own points
field is left untouched — it remains available for any manual grading
of the surrounding task content (prose, code, essay) alongside the
auto-graded quiz score.
from nbgrader_jupyterquiz.display import display_quiz
display_quiz("#notebook:0.0", grade_id="quiz-1-autograded")
### BEGIN HIDDEN TESTS
from nbgrader_jupyterquiz import grade_quiz
_questions = [...] # answer key, embedded by the preprocessor
_result = grade_quiz("quiz-1-autograded", questions=_questions)
_result.display_review()
print(f"Score: {_result.score}/{_result.max_score}")
_result.score
### END HIDDEN TESTS
When the student answers a question in the browser, the rendered JS
recorder writes (or updates) responses.json in the same directory
as the notebook via the Jupyter server’s contents API. Entries are
keyed by grade_id, so multiple quizzes in the same assignment
coexist in one file.
At autograde time, nbgrader restores the hidden-tests block from the
gradebook master, re-executes the cell, and reads the sidecar.
grade_quiz grades per-question (all-or-nothing per question) and
returns a QuizResult whose .score — the final bare expression
— becomes the cell’s partial-credit grade.
Where the answer key lives
A graded quiz auto-enables hide_correctness, which has two
effects. The browser shows neutral “Selected” / “Recorded” feedback
instead of green/red, and — equally important — the answer key is
not embedded into the release notebook at all. The display JSON
keeps each question’s text, choices, and pedagogical feedback strings,
but drops the correct flags on multiple-choice answers and the
value / range matchers on numeric answers. A student
inspecting the DOM cannot recover the key.
The autograder gets its copy from a different place: the
### BEGIN HIDDEN TESTS block above, which embeds the full
_questions = [...] literal. ClearHiddenTests strips this
block when nbgrader produces the release notebook, and
OverwriteCells restores it from the gradebook master at autograde
time. So the answer key travels along the instructor side of the
pipeline — never along the student side.
If you need correctness feedback to be visible (e.g. a self-check
quiz inside a task cell), set hide_correctness=false explicitly.
That re-enables the green / red UI and ships the full answer key
into the release JSON, since the JS needs it to colour buttons. The
trade-off is intentional: hide mode is the secure mode.
Threat model and what the security boundary is
The redaction protects against a student reading the answer key from the released notebook. The threats it stops:
Inspecting the DOM — opening browser DevTools, expanding the hidden span, and reading the embedded JSON. Without redaction, the student would see
"correct": trueon the right answers. With redaction, the JSON contains text, choices, and feedback, but no field that distinguishes correct from incorrect.Reading the notebook source as a file — opening the released
.ipynbin a text editor. Same protection: the redacted JSON is what’s on disk.Reverse-engineering from base64 — the
encoded=truedefault base64-encodes the JSON before embedding, but base64 is obfuscation, not encryption.window.atob(...)decodes trivially, and a determined student would. Redaction is what removes the answer; encoding only saves them a click.
What redaction does not protect:
Per-answer feedback strings still ship. The redaction drops
correctflags and numeric matchers, but feedback strings stay in the JSON because the JS needs them to render the Selected: … state. That makes feedback a leaky channel in graded mode: an instructor who writes+ "Paris" (Correct!)or- "Berlin" (No, Berlin is in Germany.)defeats the redaction by spelling out which answer is which. The rule of thumb: omit per-answer feedback on graded questions; reserve it forgraded=falseself-check quizzes, where correctness is openly visible anyway. A bare- (Hint: ...)default line on a numeric question is safe — it fires on any unmatched submission and the JS overrides per-answer feedback in hide mode regardless — but anything that distinguishes one answer from another is a leak.Network-loaded quiz JSON is not redacted. If you pass a JSON URL or filename to
display_quiz()outside the preprocessor pipeline, the redaction step is skipped — the JSON is loaded verbatim. Stand-alone use ofdisplay_quizis for self-check, not graded mode.
The security model is “the answer key never reaches the student’s
machine in graded mode.” That’s the whole boundary the package
controls. Editing responses.json doesn’t help a student because
the file records their submissions, not correctness scores; the
autograder grades them against an answer key the student never sees.
Graded use as a real assessment is fine, with the same caveat that
applies to any take-home exam: the package can’t stop a student
from getting outside help.
Mixing graded and self-check quizzes
Sometimes a Manually Graded Task cell contains both work that is
graded by hand (prose, code) and a self-check quiz that shouldn’t
contribute to the score. Mark the quiz with graded=false:
#### Quiz graded=false
* (SC) "Self-check — not graded"
+ "A"
- "B"
#### End Quiz
An ungraded quiz:
emits a plain
display_quiz(...)cell — no nbgrader metadata on the generated cell, no hidden-tests block, no gradebook entry;shows correctness feedback (green / red), as in v0.3.x self-checking mode;
does not render points badges unless a question carries an explicit
{N}marker (in which case the badge is shown per the usual rule);leaves the task cell’s own
pointsuntouched — those remain available for manual grading of whatever the task cell actually grades.
graded=false and hide_correctness are independent. The
interesting combinations:
Configuration |
Correctness feedback |
Auto-graded |
|---|---|---|
Quiz header (defaults) |
Hidden (Selected/Deselected) |
Yes |
|
Visible (green / red, answer key shipped to browser) |
Yes |
|
Visible (green / red) |
No |
|
Hidden (Selected/Deselected) |
No (study mode) |
Per-question points
Questions can carry individual point weights via the {N} marker on
the question line. N is any positive number, including decimals
like {0.5}:
#### Quiz
* (SC) {3} "Worth three points"
+ "A"
- "B"
* (SC) {0.5} "Half-point warm-up"
+ "A"
- "B"
* (NM) {2} "Two-point numeric"
+ <42>
#### End Quiz
Points display as a small badge next to each question. The cell’s
total nbgrader.points is the sum across questions (in the example
above: 3 + 0.5 + 2 = 5.5). Unweighted quizzes — where no question
carries a {N} marker — render without badges and implicitly treat
each question as worth 1 point.
When at least one question in a quiz has {N}, the preprocessor
propagates the default {1} onto all siblings so the visual is
consistent. Mix freely.
The feedback view
After nbgrader autograde, the autograder cell’s output contains a
static HTML review showing, per question:
the student’s selection(s),
the correct answer(s),
which were picked / which were missed,
per-question points earned / maximum.
This review is preserved in the cell output, so nbgrader
generate_feedback includes it in the per-student feedback HTML.
API
- nbgrader_jupyterquiz.grade_quiz(grade_id: str, *, questions: list[dict[str, Any]] | None = None, notebook_path: str | Path | None = None) QuizResult[source]
Grade the quiz identified by
grade_idagainst recorded responses.Parameters
- grade_idstr
The sidecar key under which student responses are recorded. The auto-generated test cells use
"<task_grade_id>-autograded".- questionslist of dict, optional
Answer key — the list of question dicts (as produced by the parser) for this quiz. When provided, the notebook is not read. This is the path taken by auto-generated test cells (see
CreateQuiz), which embed the answer key directly in their### BEGIN HIDDEN TESTSblock.- notebook_pathstr or Path, optional
Path to the notebook containing the quiz task cell, for the fallback case where
questionsisNone. Defaults to the only.ipynbfile in the current working directory.
Returns
- QuizResult
Grading result with per-question details. If the sidecar is missing or the quiz has no recorded responses, every question is reported as incorrect (score 0/max).
- class nbgrader_jupyterquiz.grader.autograde.QuizResult(grade_id: str, details: list[QuestionResult])[source]
Outcome of grading every question in a quiz region.
- details: list[QuestionResult]
- display_review() None[source]
Emit an HTML review of the quiz into the current cell output.
Intended to be called from the auto-generated hidden-tests block of a graded-quiz cell. When
nbgrader generate_feedbackconverts the autograded notebook to HTML, the review appears inline with the score so students can see which answers were correct, which they picked, and which they missed.The output is pure static HTML with scoped inline CSS — it renders correctly in any browser without a running Jupyter server or kernel.
- grade_id: str
- property max_score: float
Compute the sum of per-question point values.
Returns
- float
Total points this quiz can yield.
- property passed: bool
Return True when every question in the quiz was answered correctly.
Returns
- bool
True iff
score == max_score.
- property report: str
Return a multi-line textual summary of the grade.
Returns
- str
Human-readable breakdown of per-question outcomes.
- property score: float
Compute the sum of per-question points earned.
Returns
- float
Total points the student earned on this quiz.
- class nbgrader_jupyterquiz.grader.autograde.QuestionResult(qnum: int, question: dict[str, Any], recorded: Any, correct: bool)[source]
Outcome of grading a single question.
- correct: bool
- property earned: float
Compute the points earned on this question (all-or-nothing per question).
Returns
- float
self.pointsif correct, else 0.
- property expected: Any
Return a human-readable representation of the expected answer.
Returns
- Any
List of expected answer texts or numeric values/ranges.
- property points: float
Return the maximum points for this question (defaults to 1).
Returns
- float
Positive point weight. Integer values (from
{N}) come through asint; fractional values ({0.5}) asfloat.
- qnum: int
- question: dict[str, Any]
- property question_type: str
Return the canonical type string of the underlying question.
Returns
- str
"multiple_choice","many_choice","numeric","string", or"unknown".
- recorded: Any
Limitations
Jupyter server required. The response recorder persists answers via
fetch('/api/contents/...'). Any client that routes through a standardjupyter_serverworks: JupyterLab 4, Notebook 7, classic Notebook. The VS Code Jupyter extension is not supported — it uses a kernel-direct protocol and does not expose the contents API. Students who must use VS Code will see the quiz render but their answers will not persist; instructors running graded assessments should require JupyterLab or Notebook.Points are all-or-nothing per question. Students earn the full per-question points iff the answer is fully correct. Partial credit within a single question (e.g., “selected 2 of 3 correct MC options”) is not awarded — that would require splitting a many-choice question into separately-graded atoms.
Manual edits to `responses.json` are not detected. The grading code trusts the sidecar. If the sidecar is tampered with between submission and autograde, the altered responses will be graded. For high-stakes assessments, rely on nbgrader’s secure exchange rather than student-local editing.