
S3 to Markdown
Pricing
Pay per event

S3 to Markdown
Transform S3 documents into perfect AI training data! Converts PDFs, Word, Excel, images, audio to clean Markdown that LLMs love. Uses Microsoft's markitdown engine. Ideal for RAG systems, AI agents, and machine learning pipelines.
5.0 (2)
Pricing
Pay per event
0
2
2
Last modified
3 days ago
S3 File to Markdown Converter
This Apify Actor downloads multiple files from Amazon S3 and converts them to Markdown using markitdown.
Features
- Bulk Processing: Process multiple files in a single run for efficiency
- Downloads files from S3 buckets
- Converts various file formats to Markdown (PDF, Word, PowerPoint, Excel, Images, Audio, HTML, etc.)
- Secure credential management via encrypted input fields
- Robust Error Handling: Individual file failures don't stop the entire batch
- Progress tracking and detailed logging
- Pay-per-conversion: You only pay $0.01 for each successfully converted file
Input Configuration
The actor requires the following input parameters:
- aws_access_key_id (required, secret): Your AWS access key ID for S3 access
- aws_secret_access_key (required, secret): Your AWS secret access key for S3 access
- s3_bucket (required): The name of the S3 bucket containing the files
- s3_keys (required): Array of S3 object keys (paths) of the files to convert in the S3 bucket
- aws_region (required): The AWS region where the S3 bucket is located
AWS Credentials
AWS credentials are provided directly in the actor input as encrypted secret fields. The credentials are automatically encrypted by Apify and only decrypted during actor execution for maximum security.
Pricing
This actor uses pay-per-conversion pricing:
- π° $0.01 per successfully converted file
- β No charge for failed conversions (missing files, conversion errors, etc.)
- π Cost-effective for batch processing - process many files efficiently
- π Transparent billing - you can see exactly which files were charged in the logs (look for "charged $0.01" messages)
Example: If you process 100 files and 95 succeed, you pay $0.95 (only for the 95 successful conversions).
Example Input
{"aws_access_key_id": "YOUR_AWS_ACCESS_KEY_ID","aws_secret_access_key": "YOUR_AWS_SECRET_ACCESS_KEY","s3_bucket": "my-documents-bucket","s3_keys": ["documents/report.pdf","documents/invoice.docx","documents/presentation.pptx"],"aws_region": "us-west-2"}
Note: The AWS credentials will appear as password fields in the Apify Console and will be automatically encrypted.
Output
The actor processes multiple files and saves one record per converted file to the dataset. Each record has the following structure:
- s3_bucket: The source S3 bucket name
- s3_key: The specific S3 object key that was converted
- markdown_content: The converted Markdown content from that file
- file_size_chars: The size of the Markdown content in characters
The output is displayed in a user-friendly table format in the Apify Console's Output tab, with one row per converted file.
Example Output
For the input with multiple files above, you would get multiple records:
{"s3_bucket": "my-documents-bucket","s3_key": "documents/report.pdf","markdown_content": "# Report Title\n\nThis is the converted markdown content...","file_size_chars": 1234}
{"s3_bucket": "my-documents-bucket","s3_key": "documents/invoice.docx","markdown_content": "# Invoice\n\nInvoice Number: 12345...","file_size_chars": 856}
{"s3_bucket": "my-documents-bucket","s3_key": "documents/presentation.pptx","markdown_content": "# Presentation Title\n\n## Slide 1...","file_size_chars": 2048}
Supported File Formats
Thanks to markitdown, this actor supports:
- PDF documents
- Microsoft Office files (Word, PowerPoint, Excel)
- Images (with OCR)
- Audio files (with transcription)
- HTML files
- Text-based formats (CSV, JSON, XML)
- ZIP archives
- EPub files
- And more!
Error Handling
The actor provides robust error handling for batch processing:
- Batch Resilience: If one file fails, the actor continues processing other files
- Detailed Logging: Each file's processing status is logged individually
- No charges for failures: You're only charged for successfully converted files
- Clear Error Messages: Specific error messages for common issues:
- Missing AWS credentials
- Invalid S3 bucket
- Missing S3 objects (individual files are skipped, not charged)
- Access denied errors (individual files are skipped, not charged)
- File conversion failures (individual files are skipped, not charged)
Usage Example
from apify_client import ApifyClientclient = ApifyClient("your-api-token")# Run the actorrun = client.actor("your-actor-id").call(run_input={"aws_access_key_id": "YOUR_AWS_ACCESS_KEY_ID","aws_secret_access_key": "YOUR_AWS_SECRET_ACCESS_KEY","s3_bucket": "my-documents","s3_keys": ["files/document.pdf", "files/report.docx"],"aws_region": "us-east-1"})# Get the markdown contentfor item in client.dataset(run["defaultDatasetId"]).iterate_items():markdown_content = item["markdown_content"]s3_key = item["s3_key"]print(f"Converted {s3_key}: {len(markdown_content)} characters")# Note: You'll be charged $0.01 for each successfully converted fileprint(f"Total cost: ${run['stats']['itemsCount'] * 0.01:.2f}")