Subject-Line A/B Test Tracker
Plan subject-line A/B tests, log the real results, declare a winner with a sanity-checked significance flag, and grow a searchable library of what actually wins - so your testing stops being a guessing game.
A private, team-only web app where you plan a subject-line test, enter or import the sends/opens/clicks per variant, see the computed winner with a 'big enough to trust?' flag, approve the conclusion, and file it as a tagged learning you can search forever.
Before you start
- A free Supabase account
- A free Vercel account
- A free Resend account (for notifications)
- Your last few subject-line tests and their numbers (or a CSV of results)
The problem this kills
You test subject lines. Everybody does. "Emoji or no emoji?" "Short or long?" "Does urgency beat curiosity?" You pick a winner in your head, send the campaign, and then... the result evaporates. Three months later you run the exact same test again because nobody wrote down what happened. Worse: you "called a winner" on a test where one variant went to 40 people, which is basically a coin flip dressed up as data.
The result is testing theater. You do the motions, you feel scientific, and you learn nothing that sticks. Your open-rate lift over a year is a rounding error because every lesson leaks out the bottom of the bucket.
This kills that. Every test gets planned on purpose, every result gets logged in one place, every "winner" gets a blunt honesty check on sample size, and every conclusion becomes a permanent, searchable learning tagged by theme. Next quarter you start from what you already know instead of starting over.
What you'll build
A small, private web app - just for your team - that runs the whole loop:
- Plan a test: name it, write the hypothesis ("emoji in subject lifts opens"), define your variants (A / B / more), and set the audience split.
- Enter the results: type in sends, opens, and clicks per variant, or import a results CSV straight from your email platform.
- See the verdict: the tool computes open rate and click rate per variant, picks the apparent winner, and shows a plain-English significance flag - including a loud warning when your sample is too small to call anything.
- Approve the conclusion: you, the human, read the verdict and either approve it as a real learning or send it back. Nothing gets filed as a "lesson" until a person signs off.
- Build the library: approved learnings stack up, tagged by theme (urgency, emoji, length, personalization...), so you can search "what do we know about emoji?" and get answers.
- Export anytime: one click gives you a clean CSV of your whole learnings library.
What's inside the Implementation Plan
The plan is a complete, paste-and-go runbook for Claude Code. You don't write code - you paste, answer questions, and approve.
The best part, and the thing generic templates skip: the plan opens by interviewing you about your business. Before it builds a single screen, the agent asks how you run tests today, which email platform you use, what your subject-line numbers actually look like, your typical and peak send volumes, what counts as a "winner" for your team, and the messy exceptions (resends, A/B/C tests, win-by-clicks-not-opens). Then it reads back a short spec, waits for your thumbs-up, and tailors the data model and the math to your world - not a one-size-fits-nobody demo.
Inside you'll find: a clear definition of done, an accounts checklist, the tailored interview, the stack, a simple architecture diagram, and step-by-step build sections - each ending with a ready-to-copy prompt. It closes with how to verify everything works and the no-API CSV fallback so you can build it today regardless of which email tool you use.
The governance it includes (this is the point)
This isn't a toy. The plan bakes in the controls that make a tool trustworthy enough to actually run your decisions through:
- Login so only your team can get in.
- Row-level security so you only ever see your own organization's tests and learnings.
- A full audit trail - who planned a test, who entered results, who approved a conclusion, and exactly when.
- A human-in-the-loop approval gate - the AI computes the winner, but a person must approve before it's saved as a learning. The machine drafts; you decide.
- Duplicate guards - one test ID has one result of record, so the same test can't be logged twice and quietly corrupt your library.
Who it's for
Email marketers, newsletter operators, and lifecycle / CRM folks who run subject-line tests but never log them - and want to turn scattered guesses into a compounding library of what works. If you've ever re-run a test you already ran, this is for you. No coding background needed.
You've got this - paste the first prompt and let the agent interview you.