Docling avatar
Docling

Pricing

Pay per usage

Go to Store
Docling

Docling

vancura/docling

Developed by

Václav Vančura

Maintained by Community

Docling Document Parser & Converter – Convert documents into structured data without complexity. This Actor leverages the powerful Docling library to parse and transform various document formats into clean, structured outputs ready for analysis or integration.

5.0 (1)

Pricing

Pay per usage

3

Monthly users

23

Runs succeeded

86%

Response time

12 days

Last modified

9 days ago

Changelog

All notable changes to the Docling Actor will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.1.0] - 2025-03-09

Changed

  • Switched from full Docling CLI to docling-serve API
  • Using the official quay.io/ds4sd/docling-serve-cpu Docker image
  • Reduced Docker image size (from ~6GB to ~4GB)
  • Implemented multi-stage Docker build to handle dependencies
  • Improved Docker build process to ensure compatibility with docling-serve-cpu image
  • Added new Python processor script for reliable API communication and content extraction
  • Enhanced response handling with better content extraction logic
  • Fixed ES modules compatibility issue with Apify CLI
  • Added explicit tmpfs volume for temporary files
  • Fixed environment variables format in actor.json
  • Created optimized dependency installation approach
  • Improved API compatibility with docling-serve
    • Updated endpoint from custom /convert to standard /v1alpha/convert/source
    • Revised JSON payload structure to match docling-serve API format
    • Added proper output field parsing based on format
  • Enhanced startup process with health checks
  • Added configurable API host and port through environment variables
  • Better content type handling for different output formats
  • Updated error handling to align with API responses

Fixed

  • Fixed actor input file conflict in get_actor_input(): now checks for and removes an existing /tmp/actor-input/INPUT directory if found, ensuring valid JSON input parsing.

Technical Details

  • Actor Specification v1
  • Using quay.io/ds4sd/docling-serve-cpu:latest base image
  • Node.js 20.x for Apify CLI
  • Eliminated Python dependencies
  • Simplified Docker build process

[1.0.0] - 2025-02-07

Added

  • Initial release of Docling Actor
  • Support for multiple document formats (PDF, DOCX, images)
  • OCR capabilities for scanned documents
  • Multiple output formats (md, json, html, text, doctags)
  • Comprehensive error handling and logging
  • Dataset records with processing status
  • Memory monitoring and resource optimization
  • Security features including non-root user execution

Technical Details

  • Actor Specification v1
  • Docling v2.17.0
  • Python 3.11
  • Node.js 20.x
  • Comprehensive error codes:
    • 10: Invalid input
    • 11: URL inaccessible
    • 12: Docling processing failed
    • 13: Output file missing
    • 14: Storage operation failed
    • 15: OCR processing failed

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.