Roster Data-Quality Validator: Catch Bad Employee Data Before It Hits Payroll
Run a battery of checks on your employee roster — missing fields, bad formats, broken manager links, duplicate people, impossible dates, out-of-band pay — and get a ranked fix list that HR approves before any correction is exported.
A web tool where you import a roster, AI runs your data-quality rules and flags every issue with a suggested fix ranked by severity, HR reviews and approves each correction with a before/after view, and the tool exports a cleaned roster CSV plus a full issues report — with a complete audit trail of who changed what.
Before you start
- A Supabase account (free)
- A Vercel account (free)
- A Resend account (free)
- Your employee roster as a CSV or Google Sheet
- A reference list of valid departments, locations, and managers
- Claude Code or any AI coding agent
The problem this kills
Somebody hands you the employee roster and it's a minefield. Two people share the same employee ID. A manager column points at someone who left last year. A hire date lands in 1900 because of a spreadsheet auto-format. A salary has an extra zero. Half the "Location" values are spelled three different ways. There's a "Jon Smith" and a "John Smith" who are clearly the same person — or maybe not.
None of it looks dangerous until it flows downstream. Then payroll pays the wrong band, the org chart draws a broken reporting line, a headcount report comes out wrong in front of leadership, and a benefits file gets rejected because a required field was blank. You end up cleaning the same roster by hand every cycle, and the same errors keep coming back. You don't need to be a developer to put a stop to this.
What you'll build
A simple internal web tool. You import your roster (CSV or Google Sheet) and a small reference list of your valid departments, locations, and managers. You tell the tool your rules — which fields are required, what good formats look like, and your salary band ranges. The tool then runs a battery of validations: missing required fields, bad formats (email, phone, dates, IDs), values that aren't on your reference lists, manager links that point nowhere or loop, impossible or contradictory dates (termination before hire, future birth dates), out-of-band salaries, and near-duplicate people (same name + date of birth, or same email, even when the employee ID differs).
Every issue lands on a ranked worklist — critical problems first — each with a plain-language explanation and a suggested fix. HR reviews each one, sees the exact before/after, and clicks Approve or leaves it for manual handling. Nothing is ever overwritten automatically. Once HR approves, the tool produces a cleaned roster CSV in the exact columns your system expects, plus an issues report showing everything found, fixed, and skipped — and logs who approved what, and when.
What's inside the Implementation Plan
A start-to-finish runbook you paste into an AI coding agent. It opens by interviewing you about your business — your current roster process, the systems and spreadsheets involved, the real field names and ID/code conventions in your data, your typical and peak headcount, your exact required-field and band rules, and your messy edge cases — so the tool is tailored to your roster, not a generic template. It reflects a short spec back to you for a thumbs-up before it builds anything.
From there it walks you through every step with a ready-to-copy prompt: standing up the database with row-level security, the import screen, the rules/reference config, the validation engine (including fuzzy duplicate detection), the ranked review queue with before/after and the approval gate, the audit log, the cleaned-CSV and issues-report exports, and email notifications via Resend. It closes with how to verify the whole thing works and a CSV-first fallback so it's fully buildable today with no integration to your HRIS.
The governance it includes (this is the point)
- Login so only your HR team can use the tool.
- Row-level security so each organization only ever sees its own roster data.
- A complete audit trail — who imported, who approved which fix, what changed, and when.
- A hard human-in-the-loop approval gate: the AI drafts every suggested fix, a person reviews the before/after and approves, and only approved changes make it into the cleaned export. Ambiguous cases are left for manual handling.
- Duplicate guards so the same roster file can't be processed twice and the same person can't slip in under two IDs.
Who it's for
HR operations and HRIS admins who inherit messy rosters and spend every cycle fighting the same downstream payroll and reporting errors. If you can fill in a spreadsheet and describe your own rules, you can build and run this.
You've got this — paste the first prompt and let the agent interview you.