Docling document parser & converter – Convert documents into structured data without complexity. This Actor leverages the powerful Docling library to parse and transform various document formats into clean, structured outputs ready for analysis or integration.
Switched from full Docling CLI to docling-serve API
Using the official quay.io/ds4sd/docling-serve-cpu Docker image
Reduced Docker image size (from ~6GB to ~4GB)
Implemented multi-stage Docker build to handle dependencies
Improved Docker build process to ensure compatibility with docling-serve-cpu image
Added new Python processor script for reliable API communication and content extraction
Enhanced response handling with better content extraction logic
Fixed ES modules compatibility issue with Apify CLI
Added explicit tmpfs volume for temporary files
Fixed environment variables format in actor.json
Created optimized dependency installation approach
Improved API compatibility with docling-serve
Updated endpoint from custom /convert to standard /v1alpha/convert/source
Revised JSON payload structure to match docling-serve API format
Added proper output field parsing based on format
Enhanced startup process with health checks
Added configurable API host and port through environment variables
Better content type handling for different output formats
Updated error handling to align with API responses
Fixed
Fixed actor input file conflict in get_actor_input(): now checks for and removes an existing /tmp/actor-input/INPUT directory if found, ensuring valid JSON input parsing.
Technical Details
Actor Specification v1
Using quay.io/ds4sd/docling-serve-cpu:latest base image
Node.js 20.x for Apify CLI
Eliminated Python dependencies
Simplified Docker build process
[1.0.0] - 2025-02-07
Added
Initial release of Docling Actor
Support for multiple document formats (PDF, DOCX, images)