Invoice Data Extractor avatar
Invoice Data Extractor

Pricing

from $40.00 / 1,000 results

Go to Apify Store
Invoice Data Extractor

Invoice Data Extractor

AI-powered Bill actor for extracting structured data from invoices, receipts, and documents. Upload an image to receive clean, structured data including vendor details, invoice numbers, line items, totals, and other key fields.

Pricing

from $40.00 / 1,000 results

Rating

0.0

(0)

Developer

Taher Ali Badnawarwala

Taher Ali Badnawarwala

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

5 hours ago

Last modified

Share

OCR Data Extractor Actor

An Apify Actor that extracts structured data from invoices, receipts, and documents using AI-powered OCR technology. Simply upload an image or provide a URL, and the Actor will extract key information like vendor details, invoice numbers, line items, and totals.

What This Tool Does

This Actor connects to the MultipleWords OCR API to extract structured data from document images. It accepts image files or URLs, processes them through advanced OCR technology, and returns comprehensive structured data including vendor information, customer details, billing information, and line items.

Key Features:

  • 📄 Extract data from invoices, receipts, and documents
  • 🚀 Fast and automated document processing
  • 📦 Structured output with all key fields extracted
  • 🔄 Reliable error handling and validation
  • 📊 Complete extraction details with confidence scores
  • 🖼️ Multiple input methods (file upload)local

Purpose & Use Cases

This tool is designed to help businesses, accountants, and developers automate document data extraction:

Accounting & Finance

  • Automate invoice data entry into accounting systems
  • Extract receipt information for expense tracking
  • Process bulk invoices for payment processing
  • Digitize paper documents for archival

Business Operations

  • Streamline accounts payable workflows
  • Automate vendor information extraction
  • Process purchase orders and quotes
  • Extract data from shipping documents

E-commerce & Retail

  • Process supplier invoices automatically
  • Extract product details from purchase orders
  • Automate inventory documentation
  • Handle customer receipt processing

Development & Automation

  • Integrate OCR into automated workflows
  • Batch process documents programmatically
  • Create document processing pipelines
  • Build custom accounting integrations

Document Management

  • Digitize paper document archives
  • Extract searchable data from scanned documents
  • Automate document classification and filing
  • Create structured databases from unstructured documents

Input Parameters

The Actor accepts the following input:

file (Optional)

  • Type: String or File Upload
  • Description: Upload an image file (invoice, receipt, document) to extract data from. You can also provide a file path, URL, or base64 string.
  • Supported Formats: JPG, PNG, PDF (image-based)
  • Example: Upload via file picker in Apify Console

image_url (Optional)

  • Type: String
  • Description: URL of the image to process (alternative to file upload)
  • Example: "https://example.com/invoice.jpg"

Note: Either file or image_url must be provided. The user_id and isPro parameters are handled automatically with default values.

Output Structure

The Actor returns structured data containing the extracted document information:

{
"status": 1,
"vendor_company": "Acme Corporation",
"vendor_email": "billing@acme.com",
"customer_name": "John Smith",
"customer_email": "john.smith@email.com",
"invoice_number": "INV-2024-001",
"issue_date": "2024-01-15",
"due_date": "2024-02-15",
"subtotal": "1000.00",
"total_tax": "100.00",
"grand_total": "1100.00",
"currency": "$",
"line_items_count": 5,
"document_type": "invoice",
"extraction_confidence": "high",
"line_items": [
{
"description": "Product A",
"quantity": "2",
"unit_price": "250.00",
"amount": "500.00"
}
],
"full_details": {
"vendor_information": { ... },
"customer_information": { ... },
"billing_details": { ... },
"totals_and_taxes": { ... }
}
}

Output Fields Explained

  • status: Success indicator (1 = success)
  • vendor_company: Name of the vendor/seller company
  • vendor_email: Vendor's email address
  • customer_name: Customer/buyer name
  • customer_email: Customer's email address
  • invoice_number: Invoice or receipt number
  • issue_date: Date when the invoice was issued
  • due_date: Payment due date
  • subtotal: Subtotal amount before tax
  • total_tax: Total tax amount
  • grand_total: Final total amount
  • currency: Currency symbol or code
  • line_items_count: Number of line items/products
  • document_type: Type of document (invoice, receipt, etc.)
  • extraction_confidence: Overall confidence level of extraction
  • line_items: Array of individual items with quantities and prices
  • full_details: Complete raw data from the OCR API

How to Use

Running Locally

  1. Install dependencies:

    $npm install
  2. Create input file at storage/key_value_stores/default/INPUT.json:

    {
    "image_url": "https://example.com/invoice.jpg"
    }
  3. Run the Actor:

    $apify run
  4. View results in storage/datasets/default/

Deploy to Apify Platform

  1. Login to Apify:

    $apify login
  2. Deploy the Actor:

    $apify push
  3. Run on Apify Console:

    • Go to ActorsMy Actors
    • Select your OCR Data Extractor Actor
    • Upload a document image or provide a URL
    • Click "Start" to extract data
    • View results in the Dataset tab

Using via API

Once deployed, you can call the Actor via Apify API:

curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~ocr-data-extractor/run-sync" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/invoice.jpg"
}'

Integration Examples

With Make (Integromat)

  • Connect the Actor to your Make workflows
  • Automatically extract data when invoices are received via email
  • Send extracted data to accounting software or spreadsheets

With Zapier

  • Trigger OCR extraction from file uploads
  • Automatically add extracted data to Google Sheets or Airtable
  • Send notifications via Slack or email with extracted details

With Custom Applications

  • Integrate via Apify API into your web applications
  • Batch process documents for multiple clients
  • Create automated document processing workflows

Technical Details

  • Runtime: Node.js 18+
  • Dependencies: Apify SDK v3.5.2+
  • API Endpoint: http://shorts.multiplewords.com/mwvideos/api/image_data_extractor
  • Request Method: POST
  • Content Type: multipart/form-data
  • Mode: Batch processing (non-standby)

Error Handling

The Actor includes comprehensive error handling:

  • Validates input parameters before processing
  • Handles API errors gracefully with detailed messages
  • Provides informative error logs for debugging
  • Supports multiple input formats with fallback strategies
  • Returns appropriate exit codes for automation workflows

Best Practices

  1. Image Quality: Use high-resolution, clear images for best results
  2. File Formats: JPG and PNG work best; ensure PDFs are image-based
  3. Document Types: Works best with standard invoice/receipt layouts
  4. Batch Processing: For multiple documents, queue multiple Actor runs
  5. Error Recovery: Implement retry logic for failed extractions

Resources

Support

For issues, questions, or feature requests, please refer to the Apify documentation or community forums.


Built with ❤️ using Apify SDK