Pricing

from $12.00 / 1,000 results

AI Image Captioner

Generate accurate text descriptions for any image using AI — bulk caption product photos, screenshots, or any image URL for SEO, accessibility, and content tagging.

Pricing

from $12.00 / 1,000 results

Rating

0.0

(0)

Developer

Andrew

Actor stats

Bookmarked

Total users

Monthly active users

21 days ago

Last modified

What you get

Natural-language captions generated by Molmo 2, trained on 712,000+ human-described images
Three detail levels: brief one-liner, balanced description, or full detailed paragraph
Optional focus directive to target specific aspects (text, background, faces, objects, etc.)
One output record per image with the caption and source URL
Supports bulk processing - pass up to 50 image URLs and get all captions in a single run
Export to JSON or CSV directly from the Apify console

Use cases

E-commerce SEO - generate alt text for thousands of product images automatically
Accessibility compliance - add descriptive alt text to images on websites and apps
Content moderation - understand what's in user-uploaded images before publishing
Dataset labeling - annotate image datasets for machine learning pipelines
Digital asset management - auto-tag and describe photos in large media libraries
Social media monitoring - caption scraped images to make them searchable by content

Examples

Image	Detail Level	Caption
	High	A vibrant red Nike sneaker takes center stage in this striking advertisement, set against a bold red background that creates a visually cohesive and eye-catching composition. The shoe is positioned at an angle, giving the impression of motion and energy. The sneaker features a white Nike swoosh, darker red laces, and "Nike Free" branding on the white sole. The lighting is bright and even, highlighting the shoe's textures and details.
	Medium	A vibrant red Nike sneaker is displayed against a matching red background. The shoe features a white Nike swoosh and "Nike Free" branding on the sole. The laces are a darker shade of red, complementing the overall design.
	Low	A red Nike sneaker with white accents.
	High	A top-down view of a rustic wooden table with three round bowls arranged in a triangular formation. The central bowl features slices of medium-rare steak garnished with fresh green leaves and a red chili pepper. The left bowl holds crispy fried fish topped with a creamy sauce and herbs. The right bowl contains a meat dish garnished with thinly sliced red onions and nuts. Scattered around the bowls are whole chili peppers, cashews, and a small bowl of brown dipping sauce.
	Medium	Three bowls of food are arranged on a gray wooden table, creating a rustic dining scene. The central bowl contains sliced steak, while the left bowl holds fried fish topped with sauce and herbs. The right bowl features a meat dish garnished with onions and nuts.
	Low	Three bowls of food on a wooden table with garnishes.

How to use

Paste one or more image URLs into the Images field (or upload files directly)
Choose a Detail Level - High gives the most descriptive output (recommended for SEO and accessibility)
Optionally add a Focus hint to direct the model's attention (e.g. "describe only the text visible")
Click Run - captions appear in the Dataset tab when complete
Export results as JSON or CSV, or connect to downstream actors via the Apify API

Output format

Each dataset record:

{
  "inputImageUrl": "https://example.com/product.jpg",
  "caption": "A white ceramic coffee mug sitting on a wooden table next to an open laptop. The mug has a minimalist logo on the front and steam rising from the top, suggesting the coffee is hot.",
  "detailLevel": "high",
  "status": "success",
  "error": null
}

Input options

Field	Type	Description
Images	URL list	One or more `http/https` image URLs or base64 data URIs
Upload Images	File upload	Upload images directly from your computer
Detail Level	Select	`Low` (one-liner), `Medium` (balanced), `High` (detailed paragraph) - default: High
Focus	Text	Optional directive to focus the caption on a specific aspect of the image

Limits

Maximum 50 images per run
Each image must be a publicly accessible URL or a base64 data URI
Processing time is typically 5-15 seconds per image

Part of a complete AI image toolkit - explore the rest of the suite:

AI Image Background Remover - Remove backgrounds to clean transparent PNGs
AI Image Upscaler - Batch-upscale images to 4K or 8K
AI Image Watermark Remover - Remove text and logo watermarks from images
Image OCR Scraper - Extract text from images in 109 languages
Photo Location Finder - Find where a photo was taken - no EXIF needed

AI Image Intelligence

marielise.dev/ai-image-intelligence

Make every image work harder for your business. Auto-generate SEO-optimized metadata, accessibility-compliant alt text, and rich descriptions using AI. Perfect for e-commerce, content sites, and stock agencies processing hundreds of images daily. $0.01/image.

Marielise

GPT-Image-2 - Prompt to Image AI Generator

dev00/GPT-Image-2-prompt-to-image

Generate stunning, high-resolution AI images directly from text prompts using a secure, privacy-first GPT image generation module. Keywords: text to image, image generator, ai art, stable diffusion, midjourney, dall-e, image api, automated graphics, ai image creator.

dev00

image to image

evoort-solutions-llc/image-to-image

Evoort Solutions LLC

AI Image Generator — Text to Image, 4K (No Login)

apivault_labs/ai-image-generator-nanabanana

Generate 4K AI images from text prompts using Nano Banana Pro. Text-to-image or image-to-image with a reference photo. Multiple aspect ratios. No API key needed. Bulk generation up to 50 per run.

Apivault Labs

AI Image Generator — Text to Image

goat255/ai-image-generator

Turn text prompts into images. Provide one or many prompts and get a public image URL for each generated image.

Goutam Soni

Image Alt Text Checker | Accessibility & SEO Auditor

andok/alt-text-auditor

Bulk audit websites for missing image alt text. Improve image SEO rankings and ensure web accessibility compliance (ADA/WCAG) with automated scanning.