OCR - Extract Text from Images

Deprecated

Pricing

$5.00 / 1,000 results

See alternative Actors

Go to Apify Store

OCR - Extract Text from Images

Deprecated

See alternative Actors

Developed by

Jan Sytze Heegstra

Maintained by Community

Extract text from images using OCR. Simply plug in your Apify datasetId, indicate what column contains the URL of the image (the grey API name) and get a new dataset with all text extracted from the images.

5.0 (1)

Pricing

$5.00 / 1,000 results

Last modified

a month ago

Other

Integrations

Agents

You can access the OCR - Extract Text from Images programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

{
  "openapi": "3.0.1",
  "info": {
    "version": "1.1",
    "x-build-id": "dCgjMe5Hj31xOlo3l"
  },
  "servers": [
    {
      "url": "https://api.apify.com/v2"
    }
  ],
  "paths": {
    "/acts/wesleyyyb~ocr---extract-text-from-images/run-sync-get-dataset-items": {
      "post": {
        "operationId": "run-sync-get-dataset-items-wesleyyyb-ocr---extract-text-from-images",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    },
    "/acts/wesleyyyb~ocr---extract-text-from-images/runs": {
      "post": {
        "operationId": "runs-sync-wesleyyyb-ocr---extract-text-from-images",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor and returns information about the initiated run in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/runsResponseSchema"
                }
              }
            }
          }
        }
      }
    },
    "/acts/wesleyyyb~ocr---extract-text-from-images/run-sync": {
      "post": {
        "operationId": "run-sync-wesleyyyb-ocr---extract-text-from-images",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "inputSchema": {
        "type": "object",
        "required": [
          "datasetId"
        ],
        "properties": {
          "datasetId": {
            "title": "Source Dataset ID",
            "type": "string",
            "description": "The ID or name of the dataset containing items with image URLs to process."
          },
          "imageUrlFieldName": {
            "title": "Image URL Field Name",
            "type": "string",
            "description": "The name of the field in your dataset that contains the direct URL to the image.",
            "default": "displayUrl"
          },
          "lang": {
            "title": "Tesseract Languages",
            "type": "string",
            "description": "Language codes for Tesseract OCR (e.g., 'eng', 'spa', 'eng+deu'). Use '+' for multiple.",
            "default": "eng"
          },
          "processOnlyClean": {
            "title": "Process Only Clean Items",
            "type": "boolean",
            "description": "Only process items classified as 'clean' by Apify",
            "default": false
          }
        }
      },
      "runsResponseSchema": {
        "type": "object",
        "properties": {
          "data": {
            "type": "object",
            "properties": {
              "id": {
                "type": "string"
              },
              "actId": {
                "type": "string"
              },
              "userId": {
                "type": "string"
              },
              "startedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "finishedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "status": {
                "type": "string",
                "example": "READY"
              },
              "meta": {
                "type": "object",
                "properties": {
                  "origin": {
                    "type": "string",
                    "example": "API"
                  },
                  "userAgent": {
                    "type": "string"
                  }
                }
              },
              "stats": {
                "type": "object",
                "properties": {
                  "inputBodyLen": {
                    "type": "integer",
                    "example": 2000
                  },
                  "rebootCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "restartCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "resurrectCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "computeUnits": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "options": {
                "type": "object",
                "properties": {
                  "build": {
                    "type": "string",
                    "example": "latest"
                  },
                  "timeoutSecs": {
                    "type": "integer",
                    "example": 300
                  },
                  "memoryMbytes": {
                    "type": "integer",
                    "example": 1024
                  },
                  "diskMbytes": {
                    "type": "integer",
                    "example": 2048
                  }
                }
              },
              "buildId": {
                "type": "string"
              },
              "defaultKeyValueStoreId": {
                "type": "string"
              },
              "defaultDatasetId": {
                "type": "string"
              },
              "defaultRequestQueueId": {
                "type": "string"
              },
              "buildNumber": {
                "type": "string",
                "example": "1.0.0"
              },
              "containerUrl": {
                "type": "string"
              },
              "usage": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "integer",
                    "example": 1
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "usageTotalUsd": {
                "type": "number",
                "example": 0.00005
              },
              "usageUsd": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "number",
                    "example": 0.00005
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

OCR - Extract Text from Images OpenAPI definition

OpenAPI is a standard for designing and describing RESTful APIs, allowing developers to define API structure, endpoints, and data formats in a machine-readable way. It simplifies API development, integration, and documentation.

OpenAPI is effective when used with AI agents and GPTs by standardizing how these systems interact with various APIs, for reliable integrations and efficient communication.

By defining machine-readable API specifications, OpenAPI allows AI models like GPTs to understand and use varied data sources, improving accuracy. This accelerates development, reduces errors, and provides context-aware responses, making OpenAPI a core component for AI applications.

You can download the OpenAPI definitions for OCR - Extract Text from Images from the options below:

OpenAPI.json

If you’d like to learn more about how OpenAPI powers GPTs, read our blog post.

You can also check out our other API clients:

OCR - Extract Text from Images API in Python

OCR - Extract Text from Images API in JavaScript

OCR - Extract Text from Images API through CLI

OCR - Extract Text from Images API

Image Text Extractor

m3web/image-text-extractor

Extract text from images using OCR (Optical Character Recognition) via direct URLs or uploaded JSON/CSV files. Works with multiple languages and automatically enriches your structured file with the text found inside images.

M3Web

Google Lens / Reverse image search

borderline/google-lens

[BETA] Google Lens / Reverse image search API 🌟 Seamlessly identify text, translate in real time 🌐, recognize and classify objects 🎁, reverse search images 🔍, and extract detailed structured data 📚. It’s fast, reliable, and affordable—your essential tool for all visual intelligence needs! 🚀

borderline

Google Ads Transparency Scraper

shashankms2580/google-ads-transparency-scraper

Scrapes Google's Ad Transparency Center to check if domains are running ads. Features: ad creative extraction (images/videos), OCR text extraction, YouTube video handling, detailed stats, configurable concurrency, and robust error handling.

Shashank Shankar

5.0

Document Reader & Verification

mina_safwat/document-reader

The actor reads info from the document and verify the authenticity

Mina Safwat

Facebook Hashtag Scraper

apify/facebook-hashtag-scraper

Extract data from hundreds of Facebook posts using one or multiple hashtags. Get post text&URL, time of posting, basic poster info, image&video URLs, OCR text, likes, comments and shares count, and more. Download the data in JSON, CSV, Excel and use it in apps, spreadsheets, and reports.

Apify

4.1K

4.8

Facebook Photos Scraper

apify/facebook-photos-scraper

Extract data from one or multiple Facebook images. Get image ID, Facebook photo URL, image URL, OCR text, and more. Download the data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

1.7K

4.9

Receipt Scanner

confidential_sand/receipt-scanner

Extract store name, date, total, items and more from receipt images or PDFs using AI-powered OCR. Ideal for expense tracking, finance automation, and data extraction workflows. Handles messy real-world formats with high accuracy.

Artur Malev

OpenAI Vector Store Integration

jiri.spilka/openai-vector-store-integration

This integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.

Jiří Spilka

186

4.8

Google Ads Scraper

silva95gustavo/google-ads-scraper

Extract text, image and video ads from Google Ads, scraped from the ad library provided by Google Ads Transparency Center. Gain access to ad details, ad copy, locations, and more. Dive deeper into the Google Ads Transparency Center for a competitive edge.

Gustavo Silva (Coherent Paradox)

4.3

Slack Messages Downloader

zuzka/slack-messages-downloader

Download up to 1,000 Slack messages from a public channel of your choice. Extract message text, image URL, timestamp, reply count, user ID, reply user IDs, and more. Export Slack data in JSON, CSV, and Excel and use it for archives, backups, and automated reports.

Zuzka Pelechová

Website Content Extractor

fastidious_drawer/website-content-extractor

This extractor lets you extract content from any website with a single or multiple URLs. Use selectors to choose specific sections like the body and exclude elements like headers or navigation. It also extracts images and links, providing data in JSON and DataTable formats for easy processing.