Training Effectiveness (Kirkpatrick) Tracker
Build an internal tool that rolls reaction, learning, behavior, and results data into one effectiveness scorecard per training program - with a manager sign-off before leadership ever sees it.
A private web app where you collect training data across the four Kirkpatrick levels, auto-compute an effectiveness scorecard per program run, review and approve it, share it with leadership, and export the whole dataset to CSV.
Before you start
- A free Vercel account
- A free Supabase account
- A free Resend account
- Your existing training data in a spreadsheet (survey scores, assessment results, behavior ratings)
The problem this kills
You run real training programs, and at the end you get a stack of "smile sheets" - happy-face survey scores that tell you people enjoyed the session and absolutely nothing about whether it changed anything. Then leadership asks the question you dread: "Is this training actually working? What did we get for the money?" And all you can point to is attendance and an average satisfaction score.
The Kirkpatrick model has been the answer to that question for decades - measure four levels: reaction (did they like it?), learning (did they actually learn it?), behavior (did they change what they do on the job?), and results (did a business metric move?). The problem is the bookkeeping. Reaction scores live in one survey tool, assessment results in another, behavior change is a manager survey you send 30 and 60 days later, and the business metric is buried in someone else's dashboard. Stitching it all into a defensible scorecard, by hand, for every program run, is a part-time job nobody has time for.
This tool does the stitching for you - and keeps you honest about what the numbers really mean.
What you'll build
A private, login-protected web app for your L&D team that:
- Tracks programs and their individual runs (cohorts / sessions), so you can see effectiveness trend across runs of the same program.
- Collects data at up to four levels per run - and lets you skip the levels that don't apply, because not every program needs all four.
- Computes an effectiveness scorecard per run: a clear per-level score plus a rolled-up rating, with the math shown so it's never a black box.
- Puts a human approval gate in front of leadership: you review the scorecard - especially the behavior and results interpretation - and approve it before it can be shared.
- Flags the honest caveats automatically (correlation vs causation on the business metric), so you're never overclaiming.
- Exports the full effectiveness dataset to CSV for your records or your existing reporting.
What's inside the Implementation Plan
The plan is a single file you paste into an AI coding agent (Claude Code). It walks the agent - and you - from empty folder to working app over a weekend.
- It starts by interviewing you about your business. Before it writes a line of code, the plan makes the agent ask about your actual programs, the survey and assessment tools you use, how you name and number things, your typical and peak cohort sizes, exactly how you rate on-the-job behavior change, and which business metric you'd tie to results. It reads back a short tailored spec and waits for your thumbs-up. The tool you get fits your L&D operation - it is not a generic template.
- A clean, copy-paste prompt at the end of every step, so you're never guessing what to type next.
- A four-level data model, scoring logic you can tune, and validations shaped by your interview answers.
- The full governance build (login, per-org data isolation, audit trail, approval gate, duplicate guards).
- A "No API yet?" fallback that lets you import each level's data from a Google Sheet or CSV today - no integration required - and export a clean scorecard CSV.
The governance it includes (this is the point)
Anything you put in front of leadership has to be trustworthy, so this is built in from the start, not bolted on:
- Login so only your team can open the tool.
- Row-level security so each organization only ever sees its own training data.
- A complete audit trail - who entered which scores, who edited an interpretation, who approved a scorecard, and when.
- A hard human-in-the-loop approval gate - the app drafts the scorecard, you (the L&D manager) review and approve the behavior and results interpretation, and only then can it be marked shared / presented.
- Duplicate guards so the same level's data for the same program run can't be entered twice and silently double-count (dedupe key: program + run + level).
Who it's for
L&D and training managers who have to justify training to leadership with more than smile-sheets - people who already collect some of this data but spend too long stitching it together, and who want a defensible, repeatable effectiveness story for every program run.
You've got this - paste the first prompt and let the plan interview you.