Receipt Scanner avatar
Receipt Scanner

Pricing

Pay per usage

Go to Store
Receipt Scanner

Receipt Scanner

Developed by

Artur Malev

Artur Malev

Maintained by Community

Extract store name, date, total, items and more from receipt images or PDFs using AI-powered OCR. Ideal for expense tracking, finance automation, and data extraction workflows. Handles messy real-world formats with high accuracy.

0.0 (0)

Pricing

Pay per usage

0

Total users

1

Monthly users

1

Runs succeeded

>99%

Last modified

a day ago

Receipt Scanner is a service for automatic extraction of structured data exclusively from receipts and invoices. Processing of other document types is not guaranteed. Upload is possible only via an image URL; direct file upload is not supported.

Receipt Scanner is a universal tool for automatically extracting structured data from receipt images using modern AI providers (OpenAI and OpenRouter). The service is designed for quick integration into any business process where receipt, invoice, or document recognition and analysis are required.

Purpose and Advantages

  • Automation of routine tasks: Instantly extract information from receipts without manual input.
  • Flexible architecture: Supports multiple AI providers and is easily extendable.
  • High accuracy: Utilizes state-of-the-art models for text and document structure recognition.
  • Easy integration: API and Python interface for rapid embedding.
  • Scalability: Supports both single and batch processing with configurable concurrency.
  • Enhanced data extraction: Extracts specialized receipt data including loyalty program information, return policies, warranty details, gift receipt indicators, and promotional codes.

How It Works

  1. The user provides a URL (or several URLs for batch processing) to an image of a receipt or invoice (only URL, direct file upload is not supported).
  2. Receipt Scanner determines the provider (OpenAI or OpenRouter) and sends a request for analysis.
  3. For batch processing, requests are executed in parallel with a configurable concurrency limit.
  4. The AI model extracts key data: store, date, items, amounts, taxes, and currency.
  5. Results are cached to speed up repeated processing of the same receipts.
  6. The result is returned in a standardized JSON format.

Usage Costs

The service is provided free of charge during the testing period.

In the future, the cost will be determined according to Apify's pricing. Current prices can be found on the Apify pricing page.

Quick Start

1. Install Dependencies

$pip install -r requirements.txt

2. Configure Environment Variables

export OPENAI_API_KEY=your_key
export OPENROUTER_API_KEY=your_alternative_key
# Optionally for caching
export REDIS_URL=redis://localhost:6379/0
export CACHE_TTL_SECONDS=86400

3. Use in Python Code

Processing a single URL (backward compatibility)

from src.utils import process_receipt_url
result = process_receipt_url("https://example.com/receipt.jpg")
print(result)

Batch processing of multiple URLs

from src.utils import process_receipt_urls
urls = [
"https://example.com/receipt1.jpg",
"https://example.com/receipt2.jpg",
"https://example.com/receipt3.jpg"
]
results = process_receipt_urls(urls, max_concurrent=5)
for result in results:
print(f"URL: {result['url']}")
print(f"Cached: {result.get('cached', False)}")
if 'data' in result:
data = result['data']
print(f"Store: {data.get('store', {}).get('name')}")
print(f"Total: {data.get('totals', {}).get('total')}")
# Access enhanced data fields
if 'loyalty' in data:
print(f"Loyalty Program: {data['loyalty'].get('program_name')}")
print(f"Points Earned: {data['loyalty'].get('points_earned')}")
if 'return_policy' in data:
print(f"Return Period: {data['return_policy'].get('days_allowed')} days")
if 'promotions' in data and 'promo_codes' in data['promotions']:
print(f"Promo Codes: {', '.join(data['promotions']['promo_codes'])}")
if 'gift_receipt' in data and data['gift_receipt'].get('is_gift_receipt'):
print("This is a gift receipt")
elif 'error' in result:
print(f"Error: {result['error']}")

4. API Usage

curl -X POST http://localhost:3000/api/v1/process \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/receipt.jpg"}'

Batch Processing of Multiple URLs

curl -X POST http://localhost:3000/api/v1/process-batch \
-H "Content-Type: application/json" \
-d '{"urls": ["https://example.com/receipt1.jpg", "https://example.com/receipt2.jpg"]}'

API Usage

Input Schema

The Receipt Scanner expects a JSON object as input with the following fields:

  • receiptUrls (array of strings, required): Array of receipt image URLs to be scanned. Example:
{
"receiptUrls": [
"https://example.com/receipt1.jpg",
"https://example.com/receipt2.jpg"
],
"maxConcurrent": 5
}
  • maxConcurrent (integer, optional): Maximum number of concurrent requests for batch processing. Default is 5. Minimum is 1, maximum is 20.

Example Input

{
"receiptUrls": [
"https://example.com/receipt1.jpg",
"https://example.com/receipt2.jpg"
],
"maxConcurrent": 3
}

Example Output (Success)

[
{
"url": "https://example.com/receipt1.jpg",
"cached": false,
"data": {
"store": "Walmart",
"date": "2024-06-01",
"items": [
{"name": "Milk", "price": 2.99, "qty": 1},
{"name": "Bread", "price": 1.49, "qty": 2}
],
"total": 5.97,
"currency": "USD"
}
},
{
"url": "https://example.com/receipt2.jpg",
"cached": true,
"data": {
"store": "Target",
"date": "2024-06-02",
"items": [
{"name": "Eggs", "price": 3.49, "qty": 1}
],
"total": 3.49,
"currency": "USD"
}
}
]

Example Output (Error)

[
{
"url": "https://example.com/invalid.jpg",
"cached": false,
"error": {
"code": "INVALID_URL",
"message": "Invalid or missing URL provided."
}
}
]

Example Response (Single Receipt)

{
"store": {
"name": "Sample Store",
"address": "Moscow, Example St., 1",
"date": "2024-06-01",
"time": "14:23",
"currency": "RUB"
},
"items": [
{
"name": "Apple",
"category": "food",
"quantity": 3,
"unit_price": 0.5,
"total_price": 1.5
}
],
"totals": {
"subtotal": 1.5,
"tax": 0.15,
"total": 1.65,
"currency": "RUB"
},
"loyalty": {
"program_name": "Store Rewards",
"member_id": "12345678",
"points_earned": 15,
"points_balance": 230,
"tier_status": "Gold",
"expiration_date": "2025-12-31"
},
"return_policy": {
"days_allowed": 30,
"conditions": "Items must be in original packaging",
"receipt_required": true,
"restocking_fee": 0,
"exceptions": "Sale items are final sale"
},
"warranty": {
"duration": "1 year",
"coverage": "Parts and labor",
"contact_info": "support@example.com",
"conditions": "Valid with receipt only",
"product_registration": "Register at example.com/warranty"
},
"gift_receipt": {
"is_gift_receipt": false
},
"promotions": {
"promo_codes": ["SUMMER10", "NEXT5OFF"],
"future_discounts": ["10% off next purchase"],
"special_offers": ["Buy 2 Get 1 Free on all produce next week"],
"expiration_dates": ["2024-07-15"],
"conditions": "Cannot be combined with other offers"
}
}

Example Response (Batch Processing)

[
{
"url": "https://example.com/receipt1.jpg",
"data": {
"store": { "name": "Store 1", "date": "2024-06-01" },
"items": [ { "name": "Item 1", "total_price": 10.5 } ],
"totals": { "total": 10.5 },
"loyalty": {
"program_name": "Store Rewards",
"points_earned": 10
},
"return_policy": {
"days_allowed": 30
},
"promotions": {
"promo_codes": ["SUMMER10"]
}
},
"cached": false
},
{
"url": "https://example.com/receipt2.jpg",
"data": {
"store": { "name": "Store 2", "date": "2024-06-02" },
"items": [ { "name": "Item 2", "total_price": 15.75 } ],
"totals": { "total": 15.75 },
"gift_receipt": {
"is_gift_receipt": true,
"gift_message": "Happy Birthday!"
}
},
"cached": true
},
{
"url": "https://example.com/invalid.jpg",
"error": {
"code": "PROCESSING_ERROR",
"message": "Failed to process receipt image"
}
}
]

Features

  • Text recognition on receipt and invoice images (including handwritten and blurry ones).
  • Item categorization.
  • Support for various currencies and languages.
  • Flexible error handling and data validation.
  • Simple integration into third-party services via API.
  • Batch processing — Ability to process multiple receipt URLs simultaneously.
  • Result caching — Storing processed results by image hash to avoid redundant processing.
  • Parallel execution — Configurable concurrency limits for optimal performance.
  • Enhanced data extraction — Extracts specialized receipt data including:
    • Loyalty program information — Program name, member ID, points earned/balance, tier status
    • Return policy details — Return period, conditions, receipt requirements
    • Warranty information — Duration, coverage, contact information
    • Gift receipt identification — Detects if the receipt is a gift receipt
    • Promotional codes — Future discounts, special offers, expiration dates

Limitations

  • Recognition quality depends on the image quality and the selected model.
  • Valid API keys are required for operation.
  • Providers may impose rate and volume limits.
  • Processing is possible only for receipts and invoices. Other document types are not supported or guaranteed.

Testing

$pytest tests/

License

MIT