Invoice Data Extractor
Pricing
from $40.00 / 1,000 results
Invoice Data Extractor
AI-powered Bill actor for extracting structured data from invoices, receipts, and documents. Upload an image to receive clean, structured data including vendor details, invoice numbers, line items, totals, and other key fields.
Pricing
from $40.00 / 1,000 results
Rating
0.0
(0)
Developer

Taher Ali Badnawarwala
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
5 hours ago
Last modified
Categories
Share
OCR Data Extractor Actor
An Apify Actor that extracts structured data from invoices, receipts, and documents using AI-powered OCR technology. Simply upload an image or provide a URL, and the Actor will extract key information like vendor details, invoice numbers, line items, and totals.
What This Tool Does
This Actor connects to the MultipleWords OCR API to extract structured data from document images. It accepts image files or URLs, processes them through advanced OCR technology, and returns comprehensive structured data including vendor information, customer details, billing information, and line items.
Key Features:
- 📄 Extract data from invoices, receipts, and documents
- 🚀 Fast and automated document processing
- 📦 Structured output with all key fields extracted
- 🔄 Reliable error handling and validation
- 📊 Complete extraction details with confidence scores
- 🖼️ Multiple input methods (file upload)local
Purpose & Use Cases
This tool is designed to help businesses, accountants, and developers automate document data extraction:
Accounting & Finance
- Automate invoice data entry into accounting systems
- Extract receipt information for expense tracking
- Process bulk invoices for payment processing
- Digitize paper documents for archival
Business Operations
- Streamline accounts payable workflows
- Automate vendor information extraction
- Process purchase orders and quotes
- Extract data from shipping documents
E-commerce & Retail
- Process supplier invoices automatically
- Extract product details from purchase orders
- Automate inventory documentation
- Handle customer receipt processing
Development & Automation
- Integrate OCR into automated workflows
- Batch process documents programmatically
- Create document processing pipelines
- Build custom accounting integrations
Document Management
- Digitize paper document archives
- Extract searchable data from scanned documents
- Automate document classification and filing
- Create structured databases from unstructured documents
Input Parameters
The Actor accepts the following input:
file (Optional)
- Type: String or File Upload
- Description: Upload an image file (invoice, receipt, document) to extract data from. You can also provide a file path, URL, or base64 string.
- Supported Formats: JPG, PNG, PDF (image-based)
- Example: Upload via file picker in Apify Console
image_url (Optional)
- Type: String
- Description: URL of the image to process (alternative to file upload)
- Example:
"https://example.com/invoice.jpg"
Note: Either file or image_url must be provided. The user_id and isPro parameters are handled automatically with default values.
Output Structure
The Actor returns structured data containing the extracted document information:
{"status": 1,"vendor_company": "Acme Corporation","vendor_email": "billing@acme.com","customer_name": "John Smith","customer_email": "john.smith@email.com","invoice_number": "INV-2024-001","issue_date": "2024-01-15","due_date": "2024-02-15","subtotal": "1000.00","total_tax": "100.00","grand_total": "1100.00","currency": "$","line_items_count": 5,"document_type": "invoice","extraction_confidence": "high","line_items": [{"description": "Product A","quantity": "2","unit_price": "250.00","amount": "500.00"}],"full_details": {"vendor_information": { ... },"customer_information": { ... },"billing_details": { ... },"totals_and_taxes": { ... }}}
Output Fields Explained
- status: Success indicator (1 = success)
- vendor_company: Name of the vendor/seller company
- vendor_email: Vendor's email address
- customer_name: Customer/buyer name
- customer_email: Customer's email address
- invoice_number: Invoice or receipt number
- issue_date: Date when the invoice was issued
- due_date: Payment due date
- subtotal: Subtotal amount before tax
- total_tax: Total tax amount
- grand_total: Final total amount
- currency: Currency symbol or code
- line_items_count: Number of line items/products
- document_type: Type of document (invoice, receipt, etc.)
- extraction_confidence: Overall confidence level of extraction
- line_items: Array of individual items with quantities and prices
- full_details: Complete raw data from the OCR API
How to Use
Running Locally
-
Install dependencies:
$npm install -
Create input file at
storage/key_value_stores/default/INPUT.json:{"image_url": "https://example.com/invoice.jpg"} -
Run the Actor:
$apify run -
View results in
storage/datasets/default/
Deploy to Apify Platform
-
Login to Apify:
$apify login -
Deploy the Actor:
$apify push -
Run on Apify Console:
- Go to Actors → My Actors
- Select your OCR Data Extractor Actor
- Upload a document image or provide a URL
- Click "Start" to extract data
- View results in the Dataset tab
Using via API
Once deployed, you can call the Actor via Apify API:
curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~ocr-data-extractor/run-sync" \-H "Authorization: Bearer YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"image_url": "https://example.com/invoice.jpg"}'
Integration Examples
With Make (Integromat)
- Connect the Actor to your Make workflows
- Automatically extract data when invoices are received via email
- Send extracted data to accounting software or spreadsheets
With Zapier
- Trigger OCR extraction from file uploads
- Automatically add extracted data to Google Sheets or Airtable
- Send notifications via Slack or email with extracted details
With Custom Applications
- Integrate via Apify API into your web applications
- Batch process documents for multiple clients
- Create automated document processing workflows
Technical Details
- Runtime: Node.js 18+
- Dependencies: Apify SDK v3.5.2+
- API Endpoint:
http://shorts.multiplewords.com/mwvideos/api/image_data_extractor - Request Method: POST
- Content Type: multipart/form-data
- Mode: Batch processing (non-standby)
Error Handling
The Actor includes comprehensive error handling:
- Validates input parameters before processing
- Handles API errors gracefully with detailed messages
- Provides informative error logs for debugging
- Supports multiple input formats with fallback strategies
- Returns appropriate exit codes for automation workflows
Best Practices
- Image Quality: Use high-resolution, clear images for best results
- File Formats: JPG and PNG work best; ensure PDFs are image-based
- Document Types: Works best with standard invoice/receipt layouts
- Batch Processing: For multiple documents, queue multiple Actor runs
- Error Recovery: Implement retry logic for failed extractions
Resources
- Apify SDK Documentation
- Apify Platform Documentation
- Apify Academy - Node.js Tutorials
- Join Apify Developer Community
Support
For issues, questions, or feature requests, please refer to the Apify documentation or community forums.
Built with ❤️ using Apify SDK