Universal Document Format Transformer
Pricing
from $5.00 / 1,000 results
Universal Document Format Transformer
Universal Document Format Transformer: a cloud-based Apify Actor that converts documents (PDF, DOCX, PPTX, HTML, TXT) into Markdown, JSON, CSV, HTML or TXT using Pandoc. Easy REST API for automations (n8n, Zapier, Make), production-ready error handling, and security controls.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer

fanio zilla
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 hours ago
Last modified
Categories
Share
Apify Actor for cloud-based document format conversion using Pandoc. Convert documents from one format to another via a simple API without local installations.
Features
- π Convert between multiple document formats
- βοΈ Cloud-based - no local installation needed
- π Works with URLs from S3, Google Drive, OneDrive, etc.
- β‘ Fast processing with 60-second timeout protection
- π Structured JSON output with download URLs
- π Automatic retry for network failures
- π‘ Clear error messages with actionable suggestions
Supported Formats
Input Formats (From)
- DOCX - Microsoft Word documents
- PPTX - Microsoft PowerPoint presentations
- HTML - Web pages and HTML documents
- TXT - Plain text files
Output Formats (To)
- Markdown - Lightweight markup format
- JSON - Structured data format
- CSV - Comma-separated values
- HTML - Web format
- TXT - Plain text
- PDF - Portable Document Format
Important Notes
- β PDF cannot be used as input format - This is a Pandoc limitation
- β PDF is supported as output format only
- π Format compatibility varies - see table below
Format Compatibility Matrix
| Input \ Output | Markdown | JSON | CSV | HTML | TXT | |
|---|---|---|---|---|---|---|
| DOCX | β | β | β | β | β | β |
| PPTX | β | β | β | β | β | β |
| HTML | β | β οΈ | β οΈ | β | β | β |
| TXT | β | β οΈ | β οΈ | β | β | β |
Legend:
- β Fully Supported - Good conversion quality
- β οΈ Limited Support - May lose formatting/structure
Input
The actor accepts the following input parameters:
{"fileUrl": "https://example.com/document.docx","fromFormat": "docx","toFormat": "markdown"}
Parameters
-
fileUrl (string, required): Public URL to the document (S3, Google Drive, OneDrive, etc.)
- Must be HTTP or HTTPS protocol
- File must be publicly accessible
- Maximum file size: 50MB
-
fromFormat (string, required): Source document format
- Options:
docx,pptx,html,txt - β PDF is NOT supported as input
- Options:
-
toFormat (string, required): Target document format
- Options:
markdown,json,csv,html,txt,pdf
- Options:
Output
The actor returns the following result:
{"downloadUrl": "https://api.apify.com/v2/...","inputFormat": "docx","outputFormat": "markdown","fileSize": 12345,"processingTime": 2.5,"status": "success"}
Output Fields
- downloadUrl: URL to download the converted file (valid for 7 days)
- inputFormat: The format of the input file
- outputFormat: The format of the output file
- fileSize: Size of the converted file in bytes
- processingTime: Time taken for conversion in seconds
- status: Either "success" or "error"
Usage Examples
Example 1: Convert DOCX to Markdown
{"fileUrl": "https://example.com/report.docx","fromFormat": "docx","toFormat": "markdown"}
Example 2: Convert PPTX to PDF
{"fileUrl": "https://example.com/presentation.pptx","fromFormat": "pptx","toFormat": "pdf"}
Example 3: Convert HTML to TXT
{"fileUrl": "https://example.com/page.html","fromFormat": "html","toFormat": "txt"}
Example 4: Convert TXT to JSON
{"fileUrl": "https://example.com/data.txt","fromFormat": "txt","toFormat": "json"}
API Usage
You can run this actor via the Apify API:
curl -X POST "https://api.apify.com/v2/acts/WgRQY2Ta2VKQE5NgO/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"fileUrl": "https://example.com/document.docx","fromFormat": "docx","toFormat": "markdown"}'
Error Handling
The actor includes comprehensive error handling with clear, actionable error messages:
Common Errors & Solutions
Invalid URL
Error: Invalid URL formatSolution: Check your fileUrl starts with http:// or https://
Unsupported Input Format
Error: PDF cannot be used as input formatSolution: Use docx, pptx, html, or txt as input. PDF is output-only.
File Not Found (404)
Error: Download failed: File not foundSolution: Verify the URL is correct and the file exists
Access Denied (403)
Error: Download failed: Access deniedSolution: Use a public URL or one with proper access permissions
Timeout
Error: Pandoc conversion timed outSolution: Try with a smaller file or simpler document format
Development
Local Testing
# Install dependenciesnpm install# Run local tests (requires Pandoc for full testing)node test-local.js# Test with Docker (recommended for full testing)docker build -t universal-document-format-transformer .docker run universal-document-format-transformer
Environment Variables
MAX_FILE_SIZE: Maximum file size in bytes (default: 52428800 = 50MB)DOWNLOAD_TIMEOUT: Download timeout in milliseconds (default: 30000 = 30s)MAX_DOWNLOAD_RETRIES: Number of download retry attempts (default: 3)PANDOC_TIMEOUT: Pandoc conversion timeout in milliseconds (default: 55000 = 55s)
Conversion Quality Notes
- DOCX/PPTX to Markdown: Excellent conversion, preserves most formatting
- PDF Output: High quality, but requires Pandoc with PDF support
- HTML to Structured Formats: May lose CSS styling and complex layouts
- TXT to JSON/CSV: Limited support, best for simple structured text
- Large Files: May timeout - consider splitting or using smaller files
Troubleshooting
Actor fails with timeout
- Reduce file size
- Use simpler input format (TXT is fastest)
- Avoid complex conversions (e.g., PPTX to CSV)
Download fails repeatedly
- Check URL accessibility in browser
- Verify file is publicly accessible
- Ensure URL uses HTTP/HTTPS protocol
Conversion produces empty output
- Verify input file is not corrupted
- Check fromFormat matches actual file type
- Try different toFormat option
Links
Support
For issues, questions, or feature requests:
- Check the troubleshooting section above
- Review the error messages for specific guidance
- Test with the provided examples first
- Contact support with your input parameters and error details