2026-06-18 · 12 min read
Natalia Veretenyk— UX Academy instructor
Usability Testing: A Practical Guide for UX Designers
If you want to know whether a design works, there is only one reliable method: watch a real person try to use it.
Stakeholder opinion, designer intuition, and internal review all have their place. But usability testing is the closest thing UX has to a ground truth. Five minutes of watching someone struggle to find the checkout button tells you more than an hour of alignment meetings.
This guide covers the full process: what usability testing is, when to run it, how to recruit participants, how to write a test script, what to do during a session, and how to report what you find. It is written for designers who already know usability testing exists and want to know how to actually do it.
What usability testing is — and what it is not
Usability testing is an observational research method. You give participants realistic tasks to complete on a product, prototype, or competitor product, and you watch what happens. You are not asking people what they think of the design. You are watching what they do.
This matters because what people say and what people do are often completely different. In a focus group, participants will tell you a navigation menu "makes sense". In a usability test, you will watch them click the wrong section three times before giving up.
Usability testing is not:
- Market research — you are not surveying attitudes or preferences at scale
- A focus group — you are not gathering opinions in a group discussion
- A demo — you are not showing people how the product works
The point is to surface usability problems: friction, confusion, errors, and dead ends that real users encounter when they try to complete real tasks.
When to run usability testing
Usability testing fits two different moments in the design process.
Formative testing happens during design, before anything ships. You test early wireframes, low-fidelity prototypes, or even paper sketches. The goal is to inform design decisions — to find problems early, when fixing them is cheap. If a navigation structure confuses participants in a wireframe, you know before you have built anything.
Summative testing happens after launch, or at the end of a design sprint. You are evaluating how well the design performs against defined criteria — task completion rates, error rates, time on task. This is closer to benchmarking: you want to know how the current version compares to a previous one, or to a competitor.
Most design teams run formative testing far too rarely and summative testing almost never. If you are only testing after launch, you are missing the most valuable window.
Moderated vs unmoderated testing
The most important structural decision is whether to run moderated or unmoderated sessions.
Moderated testing
A facilitator runs a live session with one participant at a time — in person or via video call using tools like Lookback. You introduce the session, set tasks, observe, and ask follow-up questions as the participant works.
Moderated testing produces richer data. When a participant hesitates, you can ask what they are thinking. When they take an unexpected path, you can explore why. It is the right choice for early-stage exploratory work, when you do not yet know what questions to ask or what problems to look for.
The trade-off is time. Recruiting, scheduling, and running 5 moderated sessions takes days. Analysing them takes more.
Unmoderated testing
Participants complete tasks independently, typically via a tool like Maze or Lyssna. Their screen is recorded, clicks are tracked, and they may answer follow-up survey questions after each task. You review the recordings and data afterwards.
Unmoderated testing is faster and cheaper. You can send a test to 20 participants on a Monday morning and have results by Tuesday. It works well when you have a specific interaction to test — a checkout flow, a navigation change, a new feature — and you want breadth rather than depth.
The limitation is that you cannot follow up in the moment. If a participant does something unexpected, you will not find out why until the debrief question, if at all.
How many participants do you need?
Jakob Nielsen's research at the Nielsen Norman Group (2000) established that 5 participants reveal approximately 85% of usability problems in a qualitative test. This is one of the most cited and most misunderstood findings in UX research.
The key word is qualitative. The 5-user rule applies when your goal is to identify usability problems — not to measure them precisely. Running 5 participants, fixing the most critical issues, and running another 5 is more valuable than running 20 participants in a single round. Each round surfaces new problems in the updated design.
For quantitative benchmarking — measuring task completion rates, error rates, or satisfaction scores with statistical confidence — 5 participants is not nearly enough. You need 20 or more, depending on the confidence intervals you need.
For most early-stage and mid-stage design work, 5 participants per round is the right starting point.
For a broader overview of research methods and when to use them, see our guide to UX research methods.
Recruiting participants
Recruitment platforms
The fastest way to reach screened participants is through a recruitment platform:
- UserTesting — large panel, quick turnaround, higher cost; good for unmoderated studies
- Prolific — academic-grade panel, excellent for demographic targeting, lower cost per participant
- Respondent.io — specialist and B2B panels; useful when you need domain expertise (e.g. finance, healthcare)
These platforms handle consent, payments, and no-shows, which removes significant admin overhead.
Guerrilla and internal recruiting
For early-stage testing, speed matters more than perfect participant matching. Colleagues, friends, local coffee shop patrons, or social media followers can surface obvious usability problems in a low-fidelity prototype. This is sometimes called guerrilla testing — informal, fast, and cheap.
The caveat: guerrilla participants may not match your actual users. Do not use them to validate a final design. Use them to stress-test early concepts.
Screener surveys
A screener is a short survey participants complete before being accepted into a study. It filters out people who do not match your target audience — by role, experience level, device use, or behaviour.
A well-written screener saves you from spending an hour with a participant who turns out to be irrelevant to your research questions. It also prevents professional "research participants" who game recruitment platforms by selecting whatever answers get them accepted.
Keep screeners short: 5 to 8 questions. Ask about behaviour ("how often do you...") rather than self-assessed skill ("are you an expert at...?").
Incentives
In the UK, a typical rate for a 45-60 minute usability session is £30 to £50. Amazon vouchers and cash equivalents work well. For professional or specialist participants — developers, clinicians, financial advisers — rates are higher.
Pay participants fairly. Undervaluing their time affects recruitment rates and the quality of engagement you get in sessions.
Writing a test script
The test script is the backbone of the session. Writing it badly is the most common reason usability tests fail.
Start with your research questions
Before you write a single task, write down what you are trying to learn. "Does the checkout flow work?" is too vague. "Do participants understand the difference between the deposit payment and the full payment, and do they know what happens next?" is a research question you can write tasks around.
Tasks vs questions
A task is something the participant does: "You want to find a UX design course that starts in September. Show me how you would do that."
A question is something you ask: "What would you expect to happen after you click that button?"
Tasks drive behaviour. Questions uncover reasoning. A good test script uses both — but tasks come first.
Write tasks in plain language. Do not use the product's own terminology in the task description (if you are testing navigation labels, do not use those labels in the task). Make tasks realistic: give participants a scenario, not an instruction.
Bad task: "Navigate to the pricing page." Good task: "You are thinking about enrolling. You want to find out how much it costs and what is included. Show me what you would do."
Think-aloud protocol
Ask participants to narrate their thinking as they work through tasks. This is the think-aloud protocol, and it is the most valuable source of data in a moderated session.
The instruction sounds simple: "As you work through the tasks, please say out loud what you are thinking, what you are looking at, and what you are trying to do. There are no right or wrong answers — we are testing the design, not you."
Most participants will need reminding during the session. That is normal.
Avoid leading questions
Leading questions corrupt your data. "Did you find that button hard to find?" assumes there was difficulty. "What happened when you tried to find the button?" does not.
If a participant asks you a question during the session ("Should I click here?"), redirect: "What would you do if I weren't here?"
Warm-up questions
Start every session with a few minutes of warm-up questions: what the participant does, how they use similar products, how confident they feel with technology. This relaxes participants and gives you context for interpreting their behaviour.
Running the session
Your role as facilitator is to observe, not to help. This is harder than it sounds. When a participant is struggling, every instinct tells you to point them in the right direction. Resist it. Their struggle is your data.
Stay neutral. Do not react to what participants say or do in ways that signal approval or concern. If they say "I think this design is terrible," respond with "That's helpful, thank you — what specifically felt off?"
Useful facilitator phrases:
- "What are you thinking right now?"
- "What would you expect to happen next?"
- "What would you do if this were your own device?"
- "You said you expected X — can you say more about that?"
If you can, bring a second person to take notes while you facilitate. Doing both at once means you will miss things. If you are running the session solo, record it (with consent) and take sparse notes during — you can fill in detail from the recording.
Analysing findings
Raw observations from a usability test are not findings. They are data. Analysis is what turns observations into insights.
Affinity mapping
After testing, gather all your observations — notes, quotes, video clips — and group them by theme. This is affinity mapping. You are looking for patterns: multiple participants struggling with the same step, the same piece of terminology causing confusion, the same interaction creating errors.
Do this as a team if you can. Two people grouping observations independently will produce more robust themes than one person working alone.
Severity ratings
Not every usability problem needs fixing before the next release. Severity ratings help you prioritise:
- 1 — Cosmetic: minor annoyance, does not affect task completion
- 2 — Minor: causes hesitation or confusion, but participant recovers
- 3 — Major: causes significant difficulty; some participants fail the task
- 4 — Critical: prevents task completion; participants cannot recover without help
Focus your immediate effort on 3s and 4s.
Behaviour vs interpretation
Be precise about what you observed versus what you infer. "The participant clicked the wrong button" is an observation. "The participant did not understand the hierarchy" is an interpretation. Both are useful, but they belong in different parts of your analysis.
For more on complementary methods that sit alongside usability testing — particularly for identifying design problems without recruiting participants — see our guide to heuristic evaluation.
Reporting findings
A usability test report is only useful if it leads to action. Keep it focused.
For each finding, structure your report around three things:
- The problem — what happened, and with which task
- The evidence — how many participants, and what they did or said (direct quotes are powerful)
- The recommendation — a specific, actionable design change
Avoid vague recommendations like "improve the navigation." Instead: "Rename 'Resources' to 'Tools and templates' — three of five participants did not expect course materials to live under 'Resources'."
Video clips
Short clips of real users struggling are more persuasive to stakeholders than any slide deck. A 30-second clip of a participant clicking around in confusion, followed by "I have no idea where to go from here," makes a stronger case for a design change than a bar chart of task completion rates.
Tools like Dovetail let you tag and clip video highlights during analysis, then share them directly with stakeholders as a highlight reel.
Tools
- Maze — unmoderated prototype testing; integrates with Figma; good for quantitative task data
- Lyssna — unmoderated testing; strong for first-click tests and preference tests
- Lookback — moderated remote sessions; observer rooms, live streaming for stakeholders
- UserZoom / UserTesting — enterprise-grade platforms with built-in participant panels
- Dovetail — qualitative synthesis; video tagging, affinity mapping, insight sharing
For a broader look at the UX design toolkit, see our guide to UX design tools.
Usability testing sits within a wider UX design process that includes usability principles and a range of research methods. No single method tells the whole story — usability testing is most powerful when it is part of a regular research rhythm, not a one-off event.
Learn by doing
Reading about usability testing is a start. Running a real test — recruiting participants, facilitating a live session, synthesising findings, presenting to stakeholders — is where the skill actually develops.
The Intermediate UX Design and UX Career Track courses at UX Academy include hands-on usability testing modules. You plan and run real tests on live client briefs, with instructor coaching throughout. You leave with session recordings, a synthesised findings report, and the experience of having done it properly — not just read about it.