AI OCR for Tax Documents: Invoices, Balance Sheets & Tables
Pricing
from $450.00 / 1,000 tax document extracteds
AI OCR for Tax Documents: Invoices, Balance Sheets & Tables
Extract structured data from invoices, receipts, balance sheets and tabular PDFs with AI. Returns issuer, dates, totals, taxes and tables as JSON. Upload a file or pass URLs; batch or real-time API.
Pricing
from $450.00 / 1,000 tax document extracteds
Rating
0.0
(0)
Developer
Acme AI
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
🧾 AI OCR for Tax Documents (Invoices, Balance Sheets & Tables)
Turn invoices, receipts, balance sheets, bank statements and tabular PDFs into clean, structured JSON with AI. Upload a file or pass document URLs, and get back the document type, issuer/recipient, dates, totals, taxes, line-item tables and a summary - ready for your accounting system or spreadsheet.
🎯 Built for Tax & accounting teams. Not a generic text dump: the AI detects the document type and extracts the fields that matter, plus the tables, preserving layout meaning.
What you get (per document)
| Field | Description |
|---|---|
documentType | invoice, receipt, balance_sheet, income_statement, bank_statement, purchase_order, table, other |
issuerName / issuerTaxId | Vendor/company and tax ID (VAT, CNPJ, EIN...) |
recipientName / recipientTaxId | Buyer/customer and tax ID |
documentNumber, issueDate, dueDate | Document identification |
currency, subtotal, taxAmount, totalAmount | Monetary fields (plain numbers) |
tables[] | Extracted tables (line items, balances...) with columns + rows |
keyValues | Any other labelled fields (payment terms, account no., period...) |
summary | One-line description |
fileMetadata | type, sizeBytes, pageCount (PDF) |
How to use
Upload a file in the input, or pass URLs for batch:
{"documentUrls": ["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}
Supports PDF, PNG, JPG and WebP. Up to 50 documents per run (send larger volumes via sequential calls). PDFs are read natively (multi-page); images are auto-optimized before analysis.
Pricing
Charged per document successfully extracted (event tax-document-extracted). Documents that fail to download or can't be read are not charged.
Example output
[{"documentUrl": "https://example.com/invoice.pdf","success": true,"documentType": "invoice","issuerName": "ACME Ltda","issuerTaxId": "12345678000190","recipientName": "Globex Inc","documentNumber": "INV-2024-001","issueDate": "2024-03-15","dueDate": "2024-04-15","currency": "USD","subtotal": 1100.0,"taxAmount": 150.0,"totalAmount": 1250.0,"tables": [{ "title": "Line items", "columns": ["description", "qty", "unitPrice", "total"],"rows": [ { "description": "Consulting", "qty": 10, "unitPrice": 110, "total": 1100 } ] }],"keyValues": { "paymentTerms": "Net 30" },"summary": "Invoice from ACME Ltda to Globex Inc, total USD 1250.","fileMetadata": { "type": "pdf", "sizeBytes": 84210, "pageCount": 1 },"failureReason": null,"processedAt": "2026-01-01T12:00:00.000Z","error": null}]
FAQ
Which documents work best?
Clear digital PDFs and sharp scans/photos. Very low-resolution or handwritten documents may not be readable - the reason is reported in failureReason.
Does it handle multi-page PDFs? Yes. PDFs are read natively, including tables and layout, across pages.
Can I upload a file directly? Yes - use the upload field in the input, or call the API with a document URL.
Can I call it in real time?
Yes. The Standby endpoint POST /extract responds synchronously. See below.
🔌 API integration
Batch run:
curl -X POST "https://api.apify.com/v2/acts/acme-ai~ocr-tax-document-ai/run-sync-get-dataset-items?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"documentUrls":["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}'
Standby (POST /extract):
curl -X POST "https://acme-ai--ocr-tax-document-ai.apify.actor/extract" \-H "Authorization: Bearer YOUR_APIFY_TOKEN" \-H "Content-Type: application/json" \--compressed \-d '{"documentUrls":["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}'
The token goes in the Authorization: Bearer header, never in the URL.
Notes
This Actor analyzes documents you provide. You are responsible for having the right to process them and any personal or financial data they may contain.