Image to Text OCR
View all Actors
This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?
See alternative ActorsImage to Text OCR
valehprague/image-to-text
Extract machine readable textual data from image documents
Actor - Image to Text
The actor takes an input image in a specified format (base64
or url
) and using asked Optic Character Recognition (OCR) model (PaddleOCR or Tesseract) extracts textual data in required language (See OCR model documentations for available languages). The result is saved into Key-Value store as one of output formats (pdf
, txt
or bbox
)
INPUT
Input of this actor should be JSON file with following fields:
Field | Type | Description | Allowed values |
---|---|---|---|
input_type | String | Input image format | base64 or url |
input_image | String | Image | Any valid string value |
language | String | Text language | See OCR model documentations (e.g en ) |
ocr | String | Specific OCR model | paddle or tesseract |
output_format | String | Desired output format | bbox /pdf for PaddleOCR or txt /pdf for Tesseract |
Sample Input
1{ 2 "input_type": "url", 3 "input_image": "https://images4.programmersought.com/934/e8/e89758ae0ed991f1c8aba947addec9e6.png", 4 "lang": "eng", 5 "ocr": "tesseract", 6 "output_format": "txt" 7}
OUTPUT
Once the actor finishes, it will output a textual data in specified format.
- bbox : list of bounding boxes and text inside
- pdf : Base64 encoded pdf file
- txt : String text
Sample Output
1{ 2 "response": "Sample PDF Document\n\nRobert Maron\nGrzegorz. Grudziriski\n\nFebruary 20, 1999\n\x0c", 3 "error": None 4}
Developer
Maintained by Community
Categories