2026-06-13 · 8 min read

Natalia Veretenyk— UX Academy instructor

Heuristic Evaluation Template and Checklist (Free)

A heuristic evaluation is one of the highest-value things a UX designer can do in a single day - but only if you record findings consistently. This template gives you the exact setup, scoring sheet, and step-by-step process to run one from scratch.

Heuristic evaluations are popular because they are fast, inexpensive, and do not require recruiting participants. They work by having trained evaluators inspect an interface against a set of established usability principles - most commonly Nielsen's 10 usability heuristics. If you want the theory first, read our heuristic evaluation guide and our overview of usability in UX design.

This post gives you the practical side: who runs the evaluation, what goes in the spreadsheet, and how to score findings so they actually influence product decisions.


What you need before you start

Before anyone opens the interface, agree on three things:

Scope. Define which flows or screens are in scope. "The whole app" is too vague. "The sign-up flow, the onboarding checklist, and the account settings page" is actionable.

Heuristics reference. Every evaluator uses the same set of principles. Nielsen's 10 heuristics are the industry standard. Print them or share them in a shared doc so evaluators are not paraphrasing from memory.

Evaluator team. Recruit 3-5 people with UX knowledge - ideally a mix of generalist UX designers and, where possible, someone with domain expertise in the product area. They must evaluate independently before comparing notes.


The findings sheet: columns you need

The findings sheet is the core of your heuristic evaluation template. Every row is one issue. Keep the columns below - they cover everything you need to prioritise and present findings.

| Column | What to record | |---|---| | Issue ID | Sequential number (HE-001, HE-002, etc.) | | Screen / Flow | The specific page or step where the issue was found | | Heuristic violated | Which of Nielsen's 10 heuristics applies (use the number and name) | | Description | A plain-language description of the problem | | Screenshot or annotation | File name or link to the annotated screenshot | | Evaluator | Initials of the person who found it | | Frequency | How often does a user encounter this? (Rare / Occasional / Frequent) | | Impact | How hard is it to recover? (Low / Medium / High) | | Persistence | Does it recur across the session? (Once / Recurring) | | Severity score | 0-4 (see scale below) | | Recommended fix | A short, concrete suggestion | | Status | Open / In progress / Resolved |

You do not need anything more complicated than a shared Google Sheet with these columns. Avoid elaborate colour-coding schemes - they slow you down and do not add information.


Nielsen's severity rating scale (0-4)

Severity is the most important number in your sheet. It determines what gets fixed before launch and what goes in the backlog.

| Score | Label | What it means | |---|---|---| | 0 | Not a problem | Evaluator considered it but decided it does not affect usability | | 1 | Cosmetic | Minor issue; fix only if time allows | | 2 | Minor | Low priority; user can work around it | | 3 | Major | Important to fix; will cause significant frustration or errors | | 4 | Catastrophe | Must be fixed before release; blocks users or causes serious errors |

When multiple evaluators have scored the same issue, average their scores. If scores diverge by more than 1 point, discuss before averaging - disagreement usually signals that the problem description is ambiguous or that evaluators interpreted the heuristic differently.

Severity is a composite of three factors:

  • Frequency - does this happen to most users or only in an edge case?
  • Impact - if a user hits this problem, how badly does it disrupt their task?
  • Persistence - does the user encounter it once or every time they use that part of the interface?

A problem that is frequent, high-impact, and persistent is almost certainly a 3 or 4. A problem that is rare, easy to recover from, and non-recurring is usually a 1.


Step-by-step process

Step 1 - Brief the evaluators (30 minutes)

Share the scope document, the heuristics reference, and the blank findings sheet. Walk through one example issue together so everyone understands how to fill in each column. Agree a deadline for individual review (usually 2-3 working days).

Step 2 - Individual evaluation (1-2 hours per evaluator)

Each evaluator works through the defined scope alone. They should move slowly and deliberately - this is not a speed test. Typical output: 10-30 issues per evaluator for a moderately complex interface.

Encourage evaluators to capture screenshots as they go. An issue with no screenshot is harder to act on, especially for developers.

Step 3 - Consolidate findings (1 hour)

One person (the facilitator) merges all individual sheets into a master list. Duplicate issues - where two or more evaluators spotted the same problem - are collapsed into a single row. Note how many evaluators flagged each issue: an issue found by four out of five evaluators is likely more serious than one found by only one.

Step 4 - Severity scoring session (1 hour)

Run a short group call to agree severity scores. Work through the consolidated list together. For each issue: read the description aloud, look at the screenshot, and ask each evaluator for their score. Average the scores or discuss to consensus.

Step 5 - Prioritise and present

Sort the findings sheet by severity score, descending. Group 4s and 3s as immediate fixes; 2s as next sprint; 1s as backlog. Present to stakeholders with the screenshots - visual evidence makes it far easier to get sign-off on design changes.


Quick-reference checklist

Use this before and after each evaluation to make sure nothing is missed.

Setup checklist

  • [ ] Scope defined in writing and shared with all evaluators
  • [ ] Nielsen's 10 heuristics printed or linked for every evaluator
  • [ ] Blank findings sheet shared (with all columns above)
  • [ ] Evaluators confirmed as independent (no group review before individual pass)
  • [ ] Screenshot tool agreed (Figma annotation, Lightshot, OS screenshot, etc.)

During evaluation

  • [ ] Working through each screen in the defined scope
  • [ ] Logging every issue, even minor ones (severity scoring comes later)
  • [ ] Recording which heuristic is violated for every issue
  • [ ] Capturing a screenshot or annotation for every issue
  • [ ] Not discussing findings with other evaluators until the consolidation step

After individual evaluation

  • [ ] Individual sheet submitted to facilitator by agreed deadline
  • [ ] Facilitator has merged all sheets and removed exact duplicates
  • [ ] Each remaining issue has at least one screenshot
  • [ ] Group severity scoring session is booked

After scoring

  • [ ] Every issue has a severity score (0-4)
  • [ ] Issues are sorted by severity, descending
  • [ ] Recommended fix column completed for all 3s and 4s
  • [ ] Findings presented to product/design/engineering with visual evidence
  • [ ] Status column updated as fixes are made

Common mistakes to avoid

Skipping the independent phase. If evaluators discuss findings before completing their individual pass, they anchor on each other's issues and miss problems that a single pair of eyes would catch. Keep evaluation independent until the consolidation step.

Conflating description and fix. The description column should describe the problem from the user's perspective. The recommended fix column is where you suggest a solution. Mixing them makes it harder to discuss trade-offs with engineers.

Treating all 10 heuristics as equally applicable. Some heuristics - particularly "Error prevention" and "Help users recognise, diagnose, and recover from errors" - tend to surface more issues in transactional flows. "Aesthetic and minimalist design" tends to surface more in information-heavy dashboards. Evaluators should still check all 10, but calibrate attention to the type of interface.

Over-scoring to get stakeholder attention. Every issue marked as a 4 dilutes the urgency of genuine catastrophes. Reserve 4 for problems that genuinely block users. If everything is a catastrophe, nothing is.


Further reading


Learn heuristic evaluation in a live UX course

Heuristic evaluation is a core skill in professional UX practice - and one of the methods you will apply hands-on in our Beginner UX Design course.

UX Academy runs live online UX courses from the UK, with a maximum of 15 people per cohort so you get direct feedback on your own work. The next cohort starts 5 September 2026.

Want to see the course in action first? Join our free UX masterclass - a live session where you can ask questions and get a feel for how we teach.