Deprecated

This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors

Image to Text OCR

valehprague/image-to-text

Extract machine readable textual data from image documents

Actor - Image to Text

The actor takes an input image in a specified format (base64 or url) and using asked Optic Character Recognition (OCR) model (PaddleOCR or Tesseract) extracts textual data in required language (See OCR model documentations for available languages). The result is saved into Key-Value store as one of output formats (pdf, txt or bbox)

INPUT

Input of this actor should be JSON file with following fields:

Field	Type	Description	Allowed values
input_type	String	Input image format	`base64` or `url`
input_image	String	Image	Any valid string value
language	String	Text language	See OCR model documentations (e.g `en`)
ocr	String	Specific OCR model	`paddle` or `tesseract`
output_format	String	Desired output format	`bbox`/`pdf` for PaddleOCR or `txt`/`pdf` for Tesseract

Sample Input

1{
2    "input_type": "url",
3    "input_image": "https://images4.programmersought.com/934/e8/e89758ae0ed991f1c8aba947addec9e6.png",
4    "lang": "eng",
5    "ocr": "tesseract", 
6    "output_format": "txt" 
7}

OUTPUT

Once the actor finishes, it will output a textual data in specified format.

bbox : list of bounding boxes and text inside
pdf : Base64 encoded pdf file
txt : String text

Sample Output

1{
2    "response": "Sample PDF Document\n\nRobert Maron\nGrzegorz. Grudziriski\n\nFebruary 20, 1999\n\x0c", 
3    "error": None
4}

Developer

Valeh Farzaliyev

Categories

Developer tools

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Apify

61.1k

Website Content Crawler

apify/website-content-crawler

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

Apify

12.2k

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.1k

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

3.1k

Merge, Dedup & Transform Datasets

lukaskrivka/dedup-datasets

The ultimate dataset processor. Extremely fast merging, deduplications & transformations all in a single run.

Lukáš Křivka

1.6k

Actor fail manager

lukaskrivka/actor-fail-manager

Automatically triggered on a failed run to analyze if the run should be resurrected and to create an error report for the author.

Lukáš Křivka

2.3k

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

578

Website Screenshot Generator

apify/screenshot-url

Create a screenshot of a website based on a specified URL. The screenshot is stored as the output in a key-value store. It can be used to monitor web changes regularly after setting up the scheduler.

Apify

2.3k

Anti Captcha Recaptcha

petr_cermak/anti-captcha-recaptcha

🧰 Actor for solving Google reCAPTCHA using the anti-captcha.com service. You need to have an anti-captcha subscription.

Petr Cermak

1.3k

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

Apify

Where next?

Build new tools

Are you a developer? Build your own Actors and run them on Apify.

Learn more

Get a custom solution

Get a custom web scraping or RPA solution.

Book a demo

Actor - Image to Text

INPUT

Sample Input

OUTPUT

Sample Output

You might also like these Actors

Web Scraper

Website Content Crawler

Cheerio Scraper

Puppeteer Scraper

Merge, Dedup & Transform Datasets

Actor fail manager

BeautifulSoup Scraper

Website Screenshot Generator

Anti Captcha Recaptcha

Page Scraping Analyzer

Where next?

Build new tools

Get a custom solution