OCR – PDF to Word (Arabic and all other languages)
Under maintenancePricing
from $30.00 / 1,000 page processeds
OCR – PDF to Word (Arabic and all other languages)
Under maintenanceConvert Arabic PDFs to Word with Google Cloud Vision OCR. Optimized for manuscripts and books, it highlights low-confidence words in red for easy review. Get a clean, editable .docx file ready for publishing. Works for other languages too—fast, accurate, and reliable.
Pricing
from $30.00 / 1,000 page processeds
Rating
0.0
(0)
Developer
Mufaddal Shakir
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Arabic OCR – PDF to Word
Convert Arabic PDF files into fully editable Word documents using Google Cloud Vision OCR — with automatic highlighting of low-confidence words for easy human review.
⬇️ How to Download Your Output
After the Actor finishes running:
- Open the completed run
- Click the Storage tab (top of the run page)
- Click Key-value store
- Find the key named OUTPUT
- Click the Download button next to it
Your .docx file will download immediately. Open it in Microsoft Word or Google Docs.
Note: The "Dataset" tab only shows billing records (one row per page processed). Your actual Word document is always in the Key-value store under the OUTPUT key.
What This Actor Does
- Downloads your PDF from a public URL
- Runs Google Cloud Vision OCR (industry-leading Arabic accuracy)
- Compiles all pages into a single
.docxWord file - Highlights uncertain words in red so you can review them quickly
- Outputs the DOCX file to the key-value store for download
Why Use This Instead of Generic OCR?
- ✅ Arabic-first — built specifically for Arabic, not an afterthought
- ✅ Manuscript support — works on classical Arabic script, not just modern print
- ✅ Confidence highlighting — low-confidence words appear in red in the output document
- ✅ Custom page numbering — set the real-world starting page number of your PDF
Input Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
pdfUrl | ✅ Yes | — | Direct public URL to your PDF |
fileLabel | No | document | Short name for the output file (no spaces) |
startingPageNumber | No | 1 | Real-world page number of the first page |
confidenceThreshold | No | 0.9 | Words below this confidence are highlighted red (0.0–1.0) |
gcsBucketName | No | kutub-scanning | GCS bucket (leave as default) |
Supported PDF URLs
Your PDF must be publicly accessible. Supported sources:
- Direct URL — any public
https://link ending in.pdf - Google Drive — share the file as "Anyone with the link", paste the share URL
- Dropbox — paste the share link (the Actor converts it automatically)
Pricing
| Pages | Estimated Cost |
|---|---|
| 10 | ~$0.80 |
| 50 | ~$2.00 |
| 100 | ~$3.50 |
| 500 | ~$15.50 |
Pricing = $0.50 Actor start fee + $0.03 per page processed.
Example Use Cases
- Digitizing Islamic manuscript collections
- Converting scanned Arabic books for academic research
- Archiving Arabic legal or historical documents
- Bulk digitization for libraries and institutions
Notes
- Your PDF must be publicly accessible via URL
- Very large PDFs (500+ pages) may take several minutes — this is normal
- Password-protected PDFs are not supported
- For best results, use high-resolution scans (300 DPI or above)
- The red-highlighted words in the output indicate lower OCR confidence — review these manually