OCR – PDF to Word (Arabic and all other languages) avatar

OCR – PDF to Word (Arabic and all other languages)

Under maintenance

Pricing

from $30.00 / 1,000 page processeds

Go to Apify Store
OCR – PDF to Word (Arabic and all other languages)

OCR – PDF to Word (Arabic and all other languages)

Under maintenance

Convert Arabic PDFs to Word with Google Cloud Vision OCR. Optimized for manuscripts and books, it highlights low-confidence words in red for easy review. Get a clean, editable .docx file ready for publishing. Works for other languages too—fast, accurate, and reliable.

Pricing

from $30.00 / 1,000 page processeds

Rating

0.0

(0)

Developer

Mufaddal Shakir

Mufaddal Shakir

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

5 days ago

Last modified

Share

Arabic OCR – PDF to Word

Convert Arabic PDF files into fully editable Word documents using Google Cloud Vision OCR — with automatic highlighting of low-confidence words for easy human review.


⬇️ How to Download Your Output

After the Actor finishes running:

  1. Open the completed run
  2. Click the Storage tab (top of the run page)
  3. Click Key-value store
  4. Find the key named OUTPUT
  5. Click the Download button next to it

Your .docx file will download immediately. Open it in Microsoft Word or Google Docs.

Note: The "Dataset" tab only shows billing records (one row per page processed). Your actual Word document is always in the Key-value store under the OUTPUT key.


What This Actor Does

  1. Downloads your PDF from a public URL
  2. Runs Google Cloud Vision OCR (industry-leading Arabic accuracy)
  3. Compiles all pages into a single .docx Word file
  4. Highlights uncertain words in red so you can review them quickly
  5. Outputs the DOCX file to the key-value store for download

Why Use This Instead of Generic OCR?

  • Arabic-first — built specifically for Arabic, not an afterthought
  • Manuscript support — works on classical Arabic script, not just modern print
  • Confidence highlighting — low-confidence words appear in red in the output document
  • Custom page numbering — set the real-world starting page number of your PDF

Input Parameters

ParameterRequiredDefaultDescription
pdfUrl✅ YesDirect public URL to your PDF
fileLabelNodocumentShort name for the output file (no spaces)
startingPageNumberNo1Real-world page number of the first page
confidenceThresholdNo0.9Words below this confidence are highlighted red (0.0–1.0)
gcsBucketNameNokutub-scanningGCS bucket (leave as default)

Supported PDF URLs

Your PDF must be publicly accessible. Supported sources:

  • Direct URL — any public https:// link ending in .pdf
  • Google Drive — share the file as "Anyone with the link", paste the share URL
  • Dropbox — paste the share link (the Actor converts it automatically)

Pricing

PagesEstimated Cost
10~$0.80
50~$2.00
100~$3.50
500~$15.50

Pricing = $0.50 Actor start fee + $0.03 per page processed.


Example Use Cases

  • Digitizing Islamic manuscript collections
  • Converting scanned Arabic books for academic research
  • Archiving Arabic legal or historical documents
  • Bulk digitization for libraries and institutions

Notes

  • Your PDF must be publicly accessible via URL
  • Very large PDFs (500+ pages) may take several minutes — this is normal
  • Password-protected PDFs are not supported
  • For best results, use high-resolution scans (300 DPI or above)
  • The red-highlighted words in the output indicate lower OCR confidence — review these manually