Automated Document Classification Workflow
In many business scenarios, you’ll receive various document types through a single channel (like an upload portal) and need to process them differently. Document classification is the crucial first step that determines which extraction template to apply.Use Case: Financial Document Processing
Let’s explore a common use case: an accounting firm that processes various financial documents for clients:- Invoices from vendors that need to be recorded and paid
- Receipts for expense reimbursements and tax documentation
- Bank statements showing account activity
- Purchase orders documenting approved purchases
1
Create document extraction templates
Before classification, create extraction templates for each document type you handle.For our use case, you would create templates for:
- Invoice extraction template
- Receipt extraction template
- Bank statement extraction template
- Purchase order extraction template
2
Create a classification object
Use the /classification endpoint to create a classification object that can identify your document types.The response will include a
id that you’ll use to classify documents:3
Run classification on a document
4
Route document to appropriate template
Based on the classification result, route the document to the appropriate extraction template.
🎉 You’ve now successfully classified a document and routed it to the
appropriate extraction template!
Classification Performance Tips
- Use diverse training samples: When creating a classification object, include examples of each document type with different layouts, formatting, and quality levels.
-
Include an
OTHERoption: Create anOTHERoption to handle documents that don’t match any of your predefined types. This helps prevent important documents from being misclassified. - Be specific with descriptions: Write detailed descriptions for each document type that highlight unique identifying features (e.g., “Invoice: Contains line items with quantities and prices”).
- Update your options regularly: Add new document variations to your classification object as you encounter them to continuously improve accuracy.