Expense Categorization — The Menu

The job

The accounting team today spends three to five hours a week on transaction categorization. Bank feeds dump 100+ items. Credit card logs 50+. Invoices that posted to suspense accounts need sorting. Each one gets a line in the chart of accounts, a cost center tag, maybe a project code. A human scans each line and assigns it.

When the station runs this dish well, that work is gone. Ninety-five percent of transactions route to the right place automatically. The remaining five percent flag to human eyes because the station hit the confidence threshold you wrote down. That human looks at ten ambiguous ones instead of 500 certain ones.

The difference between a working recipe and a broken one is simple. The Chef defines the threshold before deployment, not after. A transaction is flagged when confidence drops below 85 percent. A business-meal receipt with no vendor name flags. A $5,000 credit to a debit account flags. A transaction that could belong to two equally valid accounts flags. Everything else flows.

The recipe

All seven ingredients still apply. The leverage on this dish is Guardrails (Ingredient #3). The Chef defines the confidence threshold. The station routes below-threshold transactions to human eyes. Without that guardrail, wrong categorizations stay in the ledger.

Training matters. The station learns from your chart of accounts and your naming convention. Context matters. When the bank feed includes vendor name and description, accuracy improves. Examples matter. Show the station five ambiguous categorizations from your own history. The station learns how you resolve them. Output Over Process matters. You don’t specify the steps. You specify the destination: every transaction in the right account within one hour of posting.

How to build it

Export your chart of accounts with account codes, names, and descriptions. The station reads your structure, not a generic template. If you have project codes or cost-center tags, include those too.
Define the confidence threshold. If the station is less than 85 percent confident, it routes to human eyes instead of auto-categorizing. Write this down before the first transaction ever runs.
Pull 20 transactions from last month. Ten straightforward ones (obvious categorization). Ten ambiguous ones (could be two accounts, missing context, or unusual pattern). Manually categorize them. Show the station. This is your training set.
Set the guardrail routing. Transactions below threshold go to a named queue or person. A real person. Not a folder that builds up.
Test on a mock run. Feed the station this week’s transactions. See where the confidence flags land. Any flags that should have been auto-routed? Any auto-routes that look wrong? Adjust.
Go live with one bank feed first. Watch for three days. Let the station categorize. The person managing the queue should see fewer than five flags a day at steady state. If flags are running higher, either lower the threshold or increase the training set.

What breaks it

Threshold too low. You set confidence at 75 percent to reduce human review. The station auto-categorizes 30 transactions wrong because 75 percent confident is still one-in-four odds. You find the misses at month-end when variance is too high. Raise the threshold to 85 percent minimum. The five percent of transaction volume that needs human eyes is cheaper than the variance catch-up.
Training set is too generic. The station never saw your company’s transaction patterns. You have a business line called “Professional Services.” The bank feed says “Consultant Invoice.” The station can’t match them without examples. Pull a real training set from your own history, not an import template.
Guardrail routing points to a backup email. Flagged transactions go to the inbox. The inbox is used by six people. Nobody owns the queue. Transactions sit for two days before someone categorizes them. Instead, route flags to one person. That person’s job for 30 minutes a day is to review queued transactions. Ownership matters.
The station never learns from fixes. The accounting person categorizes a flagged transaction. They don’t tell the station. The station flags the same pattern next week. The feedback loop died. Every fix updates the recipe, or the accuracy never improves.

When it’s working

At week four, the station categorizes 95 percent of transactions in real time. The human queue averages four to five flags a day. The person managing the queue takes 30 minutes to clear it. Month-end close is a day faster because variance reconciliation is easier. Spot checks on random transactions show zero errors on auto-categorized items.

The signal that the recipe is sharp: the accounting person can scan the flagged-transaction queue and categorize them in one sitting, without context-switching or double-checking. If they’re spending an hour on the queue, something in the training or guardrails needs tightening.

Monday Move

Export your chart of accounts with all the codes and naming. Define your confidence threshold: 85 percent minimum. Pull 20 transactions from last month. Manually categorize them. That’s your training set. Set up a dedicated person for the flagged-transaction queue. The station is running on Monday.

Dish 1 of 10 on the Finance Station. Build-note leverage: Guardrails (Ingredient #3).