Graded Quizzes

Starting with v0.4.0, quizzes embedded in nbgrader Manually Graded Task cells can be graded automatically. Student responses are recorded into a responses.json sidecar file as the student answers, and nbgrader autograde evaluates them against the instructor’s answer key with partial-credit support.

The instructor only needs to:

Register CreateQuiz at the front of the GenerateAssignment preprocessor list (see nbgrader Pipeline).
Write the quiz inside a Manually Graded Task cell — exactly the same authoring flow as the self-checking quiz in Quiz Syntax.

Because nbgrader requires every Manually Graded Task cell to carry a grade_id, the preprocessor promotes the quiz to graded mode by default: correctness feedback is hidden (so students can’t guess their way through), each answer click is persisted to the sidecar, and an autograder cell is auto-generated. No separate test cell needs to be written by hand. To opt a quiz out on a per-quiz basis, use graded=false in the quiz header (see Mixing graded and self-check quizzes below).

How it works 

At generate_assignment time, CreateQuiz emits one code cell per quiz region. The cell’s visible source (in the release) is just the usual display_quiz(...) call. The cell’s hidden-tests block — stripped from the release by nbgrader’s ClearHiddenTests and restored at autograde time by OverwriteCells — embeds the answer key as a Python literal and invokes grade_quiz(). A bare _result.score at the end of the cell feeds nbgrader’s partial-credit machinery (utils.determine_grade). The task cell’s own points field is left untouched — it remains available for any manual grading of the surrounding task content (prose, code, essay) alongside the auto-graded quiz score.

from nbgrader_jupyterquiz.display import display_quiz
display_quiz("#notebook:0.0", grade_id="quiz-1-autograded")
### BEGIN HIDDEN TESTS
from nbgrader_jupyterquiz import grade_quiz
_questions = [...]  # answer key, embedded by the preprocessor
_result = grade_quiz("quiz-1-autograded", questions=_questions)
_result.display_review()
print(f"Score: {_result.score}/{_result.max_score}")
_result.score
### END HIDDEN TESTS

When the student answers a question in the browser, the rendered JS recorder writes (or updates) responses.json in the same directory as the notebook via the Jupyter server’s contents API. Entries are keyed by grade_id, so multiple quizzes in the same assignment coexist in one file.

At autograde time, nbgrader restores the hidden-tests block from the gradebook master, re-executes the cell, and reads the sidecar. grade_quiz grades per-question (all-or-nothing per question) and returns a QuizResult whose .score — the final bare expression — becomes the cell’s partial-credit grade.

Where the answer key lives 

A graded quiz auto-enables hide_correctness, which has two effects. The browser shows neutral “Selected” / “Recorded” feedback instead of green/red, and — equally important — the answer key is not embedded into the release notebook at all. The display JSON keeps each question’s text, choices, and pedagogical feedback strings, but drops the correct flags on multiple-choice answers and the value / range matchers on numeric answers. A student inspecting the DOM cannot recover the key.

The autograder gets its copy from a different place: the ### BEGIN HIDDEN TESTS block above, which embeds the full _questions = [...] literal. ClearHiddenTests strips this block when nbgrader produces the release notebook, and OverwriteCells restores it from the gradebook master at autograde time. So the answer key travels along the instructor side of the pipeline — never along the student side.

If you need correctness feedback to be visible (e.g. a self-check quiz inside a task cell), set hide_correctness=false explicitly. That re-enables the green / red UI and ships the full answer key into the release JSON, since the JS needs it to colour buttons. The trade-off is intentional: hide mode is the secure mode.

Threat model and what the security boundary is 

The redaction protects against a student reading the answer key from the released notebook. The threats it stops:

Inspecting the DOM — opening browser DevTools, expanding the hidden span, and reading the embedded JSON. Without redaction, the student would see "correct": true on the right answers. With redaction, the JSON contains text, choices, and feedback, but no field that distinguishes correct from incorrect.
Reading the notebook source as a file — opening the released .ipynb in a text editor. Same protection: the redacted JSON is what’s on disk.
Reverse-engineering from base64 — the encoded=true default base64-encodes the JSON before embedding, but base64 is obfuscation, not encryption. window.atob(...) decodes trivially, and a determined student would. Redaction is what removes the answer; encoding only saves them a click.

What redaction does not protect:

Per-answer feedback strings still ship. The redaction drops correct flags and numeric matchers, but feedback strings stay in the JSON because the JS needs them to render the Selected: … state. That makes feedback a leaky channel in graded mode: an instructor who writes + "Paris" (Correct!) or - "Berlin" (No, Berlin is in Germany.) defeats the redaction by spelling out which answer is which. The rule of thumb: omit per-answer feedback on graded questions; reserve it for graded=false self-check quizzes, where correctness is openly visible anyway. A bare - (Hint: ...) default line on a numeric question is safe — it fires on any unmatched submission and the JS overrides per-answer feedback in hide mode regardless — but anything that distinguishes one answer from another is a leak.
Network-loaded quiz JSON is not redacted. If you pass a JSON URL or filename to display_quiz() outside the preprocessor pipeline, the redaction step is skipped — the JSON is loaded verbatim. Stand-alone use of display_quiz is for self-check, not graded mode.

The security model is “the answer key never reaches the student’s machine in graded mode.” That’s the whole boundary the package controls. Editing responses.json doesn’t help a student because the file records their submissions, not correctness scores; the autograder grades them against an answer key the student never sees. Graded use as a real assessment is fine, with the same caveat that applies to any take-home exam: the package can’t stop a student from getting outside help.

Mixing graded and self-check quizzes 

Sometimes a Manually Graded Task cell contains both work that is graded by hand (prose, code) and a self-check quiz that shouldn’t contribute to the score. Mark the quiz with graded=false:

#### Quiz graded=false
* (SC) "Self-check — not graded"
  + "A"
  - "B"
#### End Quiz

An ungraded quiz:

emits a plain display_quiz(...) cell — no nbgrader metadata on the generated cell, no hidden-tests block, no gradebook entry;
shows correctness feedback (green / red), as in v0.3.x self-checking mode;
does not render points badges unless a question carries an explicit {N} marker (in which case the badge is shown per the usual rule);
leaves the task cell’s own points untouched — those remain available for manual grading of whatever the task cell actually grades.

graded=false and hide_correctness are independent. The interesting combinations:

Configuration	Correctness feedback	Auto-graded
Quiz header (defaults)	Hidden (Selected/Deselected)	Yes
`#### Quiz hide_correctness=false`	Visible (green / red, answer key shipped to browser)	Yes
`#### Quiz graded=false`	Visible (green / red)	No
`#### Quiz graded=false hide_correctness=true`	Hidden (Selected/Deselected)	No (study mode)

Per-question points 

Questions can carry individual point weights via the {N} marker on the question line. N is any positive number, including decimals like {0.5}:

#### Quiz
* (SC) {3} "Worth three points"
  + "A"
  - "B"
* (SC) {0.5} "Half-point warm-up"
  + "A"
  - "B"
* (NM) {2} "Two-point numeric"
  + <42>
#### End Quiz

Points display as a small badge next to each question. The cell’s total nbgrader.points is the sum across questions (in the example above: 3 + 0.5 + 2 = 5.5). Unweighted quizzes — where no question carries a {N} marker — render without badges and implicitly treat each question as worth 1 point.

When at least one question in a quiz has {N}, the preprocessor propagates the default {1} onto all siblings so the visual is consistent. Mix freely.

The feedback view 

After nbgrader autograde, the autograder cell’s output contains a static HTML review showing, per question:

the student’s selection(s),
the correct answer(s),
which were picked / which were missed,
per-question points earned / maximum.

This review is preserved in the cell output, so nbgrader generate_feedback includes it in the per-student feedback HTML.

API 

nbgrader_jupyterquiz.grade_quiz(grade_id: str, *, questions: list[dict[str, Any]] | None = None, notebook_path: str | Path | None = None) → QuizResult[source]

Grade the quiz identified by grade_id against recorded responses.

Parameters

grade_idstr: The sidecar key under which student responses are recorded. The auto-generated test cells use "<task_grade_id>-autograded".
questionslist of dict, optional: Answer key — the list of question dicts (as produced by the parser) for this quiz. When provided, the notebook is not read. This is the path taken by auto-generated test cells (see CreateQuiz), which embed the answer key directly in their ### BEGIN HIDDEN TESTS block.
notebook_pathstr or Path, optional: Path to the notebook containing the quiz task cell, for the fallback case where questions is None. Defaults to the only .ipynb file in the current working directory.

Returns

QuizResult: Grading result with per-question details. If the sidecar is missing or the quiz has no recorded responses, every question is reported as incorrect (score 0/max).

class nbgrader_jupyterquiz.grader.autograde.QuizResult(grade_id: str, details: list[QuestionResult])[source]

Outcome of grading every question in a quiz region.

details: list[QuestionResult]

display_review() → None[source]

Emit an HTML review of the quiz into the current cell output.

Intended to be called from the auto-generated hidden-tests block of a graded-quiz cell. When nbgrader generate_feedback converts the autograded notebook to HTML, the review appears inline with the score so students can see which answers were correct, which they picked, and which they missed.

The output is pure static HTML with scoped inline CSS — it renders correctly in any browser without a running Jupyter server or kernel.

grade_id: str

property max_score: float

Compute the sum of per-question point values.

Returns

float: Total points this quiz can yield.

property passed: bool

Return True when every question in the quiz was answered correctly.

Returns

bool: True iff score == max_score.

property report: str

Return a multi-line textual summary of the grade.

Returns

str: Human-readable breakdown of per-question outcomes.

property score: float

Compute the sum of per-question points earned.

Returns

float: Total points the student earned on this quiz.

class nbgrader_jupyterquiz.grader.autograde.QuestionResult(qnum: int, question: dict[str, Any], recorded: Any, correct: bool)[source]

Outcome of grading a single question.

correct: bool

property earned: float

Compute the points earned on this question (all-or-nothing per question).

Returns

float: self.points if correct, else 0.

property expected: Any

Return a human-readable representation of the expected answer.

Returns

Any: List of expected answer texts or numeric values/ranges.

property points: float

Return the maximum points for this question (defaults to 1).

Returns

float: Positive point weight. Integer values (from {N}) come through as int; fractional values ({0.5}) as float.

qnum: int

question: dict[str, Any]

property question_type: str

Return the canonical type string of the underlying question.

Returns

str: "multiple_choice", "many_choice", "numeric", "string", or "unknown".

recorded: Any

Limitations 

Jupyter server required. The response recorder persists answers via fetch('/api/contents/...'). Any client that routes through a standard jupyter_server works: JupyterLab 4, Notebook 7, classic Notebook. The VS Code Jupyter extension is not supported — it uses a kernel-direct protocol and does not expose the contents API. Students who must use VS Code will see the quiz render but their answers will not persist; instructors running graded assessments should require JupyterLab or Notebook.
Points are all-or-nothing per question. Students earn the full per-question points iff the answer is fully correct. Partial credit within a single question (e.g., “selected 2 of 3 correct MC options”) is not awarded — that would require splitting a many-choice question into separately-graded atoms.
Manual edits to `responses.json` are not detected. The grading code trusts the sidecar. If the sidecar is tampered with between submission and autograde, the altered responses will be graded. For high-stakes assessments, rely on nbgrader’s secure exchange rather than student-local editing.