runbookify
← All plans
Customer Support & Service / CSAT, NPS & Voice of Customer

Agent-Level CSAT Scorecard

Build an internal tool that rolls up customer-satisfaction scores by agent and team — with sample size and ticket-type context baked in — so coaching is based on real signal, not one bad rating.

BeginnerAn afternoonBuilds onNext.js (App Router) on VercelSupabase (Postgres, Storage, Auth + RLS)Resend (email)
What you'll build

A private, login-protected scorecard that computes per-agent and per-team CSAT with volume and ticket-mix context, flags low-confidence scores, and lets a manager approve which signals become coaching items before anything is shared.

Gated download

Enter your email — the plan downloads instantly and a copy lands in your inbox.

By submitting your email you'll also receive the weekly runbookify newsletter. You can unsubscribe at any time.

Before you start

  • A CSV or Google Sheet of CSAT responses linked to an agent and a ticket type
  • Free Vercel, Supabase, and Resend accounts (the plan walks you through each)
  • Claude Code installed and ready to go

The problem this kills

You want to coach your support team, but the data fights you. One agent has a 100% CSAT — on four tickets. Another sits at 82% — across three hundred tickets, half of them angry billing escalations nobody enjoys. Raw averages make the part-timer look like a star and your hardest worker look mediocre. So coaching conversations turn into arguments about whether the number is even fair, and a single furious one-star rating can color a whole quarter.

The honest answer is that CSAT only means something with context: how many responses, and what kind of tickets. Most teams don't have a tool that shows that context, so they either over-trust the number or ignore it entirely. Both are bad. You end up coaching on noise.

What you'll build

A small, private web app — just for your team — that takes your CSAT responses, links each one to the agent and the ticket type, and rolls everything up into a clear scorecard by agent and by team. Crucially, it shows the things that make a score trustworthy or not: how many responses sit behind each number, and the mix of ticket types each agent handled. Scores below a sample-size threshold you set get hidden or clearly caveated, so nobody gets judged on five ratings.

Then it does the part that actually matters for coaching: it surfaces candidate "signals" (an agent trending low on a specific ticket type, a real dip versus last period), and a manager reviews and approves which of those become coaching items — before a single one is shared with an agent. The AI suggests; a human decides.

What's inside the Implementation Plan

The plan is a complete, paste-and-go runbook for Claude Code. You don't write code — you paste, answer questions, and approve.

It opens by interviewing you about your support operation — your survey tool, how agents and teams are named, your ticket-type categories, your typical and peak response volumes, what counts as a "fair" sample size for you, and your messy edge cases (transfers, reopened tickets, surveys with no agent attached). It reflects a short tailored spec back to you and waits for your thumbs-up, so the tool fits your team instead of a generic template. From there it builds the database, the import, the scoring math, the review-and-approve screen, the exports, and the optional email — each step ending in a ready-to-copy prompt.

The governance it includes (this is the point)

This isn't a throwaway spreadsheet macro. The plan builds in the controls that make a people-data tool safe to actually use:

  • Login so only your team can open it.
  • Row-level security so each manager sees only their own organization's data.
  • A complete audit trail — who imported, who approved which signal, and when.
  • A human-in-the-loop approval gate — the AI drafts coaching signals, the manager reviews and approves, and only approved items are shared. Nothing reaches an agent automatically.
  • Duplicate guards keyed on the response ID, so re-importing the same export can't double-count a rating.
  • Fairness guards — scores below your minimum sample size are hidden or caveated, and ticket-type mix is shown so difficulty isn't ignored.

Who it's for

Support managers and team leads who run performance and coaching conversations and want them grounded in fair, defensible numbers — not a single bad rating or a flattering small sample. If you can use a spreadsheet and follow instructions, you can build this.

You've got this — paste the first prompt and let the plan interview you.

Gated download

Enter your email — the plan downloads instantly and a copy lands in your inbox.

By submitting your email you'll also receive the weekly runbookify newsletter. You can unsubscribe at any time.