AI OCR for Tax Documents: Invoices, Balance Sheets & Tables avatar

AI OCR for Tax Documents: Invoices, Balance Sheets & Tables

Pricing

from $450.00 / 1,000 tax document extracteds

Go to Apify Store
AI OCR for Tax Documents: Invoices, Balance Sheets & Tables

AI OCR for Tax Documents: Invoices, Balance Sheets & Tables

Extract structured data from invoices, receipts, balance sheets and tabular PDFs with AI. Returns issuer, dates, totals, taxes and tables as JSON. Upload a file or pass URLs; batch or real-time API.

Pricing

from $450.00 / 1,000 tax document extracteds

Rating

0.0

(0)

Developer

Acme AI

Acme AI

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

🧾 AI OCR for Tax Documents (Invoices, Balance Sheets & Tables)

Turn invoices, receipts, balance sheets, bank statements and tabular PDFs into clean, structured JSON with AI. Upload a file or pass document URLs, and get back the document type, issuer/recipient, dates, totals, taxes, line-item tables and a summary - ready for your accounting system or spreadsheet.

🎯 Built for Tax & accounting teams. Not a generic text dump: the AI detects the document type and extracts the fields that matter, plus the tables, preserving layout meaning.


What you get (per document)

FieldDescription
documentTypeinvoice, receipt, balance_sheet, income_statement, bank_statement, purchase_order, table, other
issuerName / issuerTaxIdVendor/company and tax ID (VAT, CNPJ, EIN...)
recipientName / recipientTaxIdBuyer/customer and tax ID
documentNumber, issueDate, dueDateDocument identification
currency, subtotal, taxAmount, totalAmountMonetary fields (plain numbers)
tables[]Extracted tables (line items, balances...) with columns + rows
keyValuesAny other labelled fields (payment terms, account no., period...)
summaryOne-line description
fileMetadatatype, sizeBytes, pageCount (PDF)

How to use

Upload a file in the input, or pass URLs for batch:

{
"documentUrls": [
"https://example.com/invoice.pdf",
"https://example.com/receipt.jpg"
]
}

Supports PDF, PNG, JPG and WebP. Up to 50 documents per run (send larger volumes via sequential calls). PDFs are read natively (multi-page); images are auto-optimized before analysis.


Pricing

Charged per document successfully extracted (event tax-document-extracted). Documents that fail to download or can't be read are not charged.


Example output

[
{
"documentUrl": "https://example.com/invoice.pdf",
"success": true,
"documentType": "invoice",
"issuerName": "ACME Ltda",
"issuerTaxId": "12345678000190",
"recipientName": "Globex Inc",
"documentNumber": "INV-2024-001",
"issueDate": "2024-03-15",
"dueDate": "2024-04-15",
"currency": "USD",
"subtotal": 1100.0,
"taxAmount": 150.0,
"totalAmount": 1250.0,
"tables": [
{ "title": "Line items", "columns": ["description", "qty", "unitPrice", "total"],
"rows": [ { "description": "Consulting", "qty": 10, "unitPrice": 110, "total": 1100 } ] }
],
"keyValues": { "paymentTerms": "Net 30" },
"summary": "Invoice from ACME Ltda to Globex Inc, total USD 1250.",
"fileMetadata": { "type": "pdf", "sizeBytes": 84210, "pageCount": 1 },
"failureReason": null,
"processedAt": "2026-01-01T12:00:00.000Z",
"error": null
}
]

FAQ

Which documents work best? Clear digital PDFs and sharp scans/photos. Very low-resolution or handwritten documents may not be readable - the reason is reported in failureReason.

Does it handle multi-page PDFs? Yes. PDFs are read natively, including tables and layout, across pages.

Can I upload a file directly? Yes - use the upload field in the input, or call the API with a document URL.

Can I call it in real time? Yes. The Standby endpoint POST /extract responds synchronously. See below.


🔌 API integration

Batch run:

curl -X POST "https://api.apify.com/v2/acts/acme-ai~ocr-tax-document-ai/run-sync-get-dataset-items?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"documentUrls":["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}'

Standby (POST /extract):

curl -X POST "https://acme-ai--ocr-tax-document-ai.apify.actor/extract" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
--compressed \
-d '{"documentUrls":["https://example.com/invoice.pdf","https://example.com/receipt.jpg"]}'

The token goes in the Authorization: Bearer header, never in the URL.


Notes

This Actor analyzes documents you provide. You are responsible for having the right to process them and any personal or financial data they may contain.