Document Classification — The Menu

The job

Files come in as PDFs, images, scanned documents. A human flips through them and decides: this is an invoice, that’s a contract, this one is an application. Then they sort them into folders or flag them for the right person. This takes four to six hours a week.

The dish reads the inbound document. It classifies the type (invoice, contract, employment application, regulatory form, support request, partnership proposal). It tags it with metadata (vendor name, document date, priority level based on your rules). It routes it to the right folder or person. The human never touches the file unless the station’s confidence is below your threshold. By week two, the inbox processing goes from a daily task to a once-a-week review.

Plated well, this looks like: a consistent filing system where every document is where you expect it. Metadata that the rest of the business can search by. Urgent items routed immediately. Low-confidence documents held for human eyes without breaking the flow. The filing happens in the background. The human approves exceptions.

The recipe

All seven ingredients still apply. The leverage on this dish is Context (Ingredient #2). The station needs to know your filing system. Not just the categories, but the rules that decide which category each document lands in. Context isn’t the document. Context is the schema.

Training (Ingredient #1) sits underneath. Your house standard for how you name files, where priority documents live, what metadata you use. Examples (Ingredient #4) are load-bearing. Show the station five documents for each category and it picks up the visual patterns that separate an invoice from a contract.

Guardrails (Ingredient #3) are the second lever. The station’s confidence score below eighty-five percent gets escalated to a human instead of auto-routed. Output Over Process (Ingredient #5) is simple here. “Route this document correctly.” Not “apply these rules in this order.”

How to build it

Define your filing system. Write down the document categories you actually use. Invoices. Contracts. Applications. Regulatory. Support. Don’t invent categories for the station. Use the ones you have. Ten categories maximum or the schema becomes noise.
Define metadata rules. For each category, write down what you extract. Invoice: vendor name, date, amount, PO number. Contract: vendor, signature date, key terms. This is the data structure the station will return.
Pull five examples per category from your archive. These are the training set. Real documents from your business. Invoice with the PO written three ways. Contract with the signature page scanned backwards. Application from three different vendors. This variety teaches the station what matters.
Set confidence thresholds. The station returns a confidence score. Below eighty-five percent equals human review. Above ninety-five percent equals auto-route. Eight-five to ninety-five is a “suggest and wait for approval” middle ground. These numbers are your guardrails.
Build the routing rules. Invoices go to accounts-payable folder and notify the AP manager. Contracts go to legal and notify the contracts folder. Applications go to HR with a flag if salary is above the threshold. The destination is the output.
Point the station at the inbound folder. Real documents start flowing in. The first week, monitor every classification. The station is learning your patterns. By week two, the confidence score tells you whether it’s ready.
Set the exception workflow. When confidence is in the middle range, the station routes to a human approval queue with the classification highlighted. This is the pass work. Someone reviews and confirms or corrects. Every correction teaches the station.
Measure accuracy weekly. Count the auto-routed documents that land in the right place. Count the corrections humans made. Below ninety-five percent accuracy after week two means one or more categories need sharper definition.

What breaks it

No schema exists. You want the station to classify documents but you’ve never written down your categories. It defaults to a generic system. Then by week three the team is quietly reorganizing documents and the station never learns.
Confidence thresholds set wrong. Threshold too high and every document lands in the approval queue. Threshold too low and misclassified documents pollute your filing system. The number matters. Start at eighty-five percent and adjust weekly based on what human reviewers are correcting.
Examples are too clean. You give the station five perfect invoices and then real invoices arrive with smudged dates, missing vendor info, and handwritten amounts. The station was trained on ideals. Real data confuses it. Dirty examples from your archive teach it reality.
No feedback mechanism. The team quietly reclassifies documents the station got wrong. Nobody tells the station. Six months later it’s still misclassifying the same document types because the miss never made it back to training.

When it’s working

By week two, eighty percent of documents auto-route without human intervention. The exceptions land in a clear queue with the station’s confidence score visible. When a human confirms or corrects a classification, the station’s accuracy improves by three to five percent. By week four, accuracy is above ninety-five percent. The filing system is consistent. Metadata is complete. The team spends four to six hours a week approving exceptions instead of forty hours a week doing filing.

The signal that the recipe is sharp: a new document type arrives (a new vendor form, a regulatory filing you’ve never seen) and the station gets it right on the first try because it learned the classification patterns, not the specific files.

Monday Move

Pull the last ten documents that came in today. Manually classify them into five categories. That’s your schema. Show the station. Have it classify twenty more while you watch it work. This week, document why it’s getting any wrong. By Friday, you’ll know if the schema needs sharpening or if the examples need variety.

Dish 1 of 10 on the Operations Station. Build-note leverage: Context (Ingredient #2).