AI Image Captioner avatar

AI Image Captioner

Pricing

from $12.00 / 1,000 results

Go to Apify Store
AI Image Captioner

AI Image Captioner

Generate accurate text descriptions for any image using AI — bulk caption product photos, screenshots, or any image URL for SEO, accessibility, and content tagging.

Pricing

from $12.00 / 1,000 results

Rating

0.0

(0)

Developer

Andrew

Andrew

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Generate accurate, detailed text descriptions for any image using AI — bulk caption product photos, screenshots, or any image URL for SEO alt text, accessibility compliance, and content tagging.

What you get

  • Natural-language captions generated by Molmo 2, trained on 712,000+ human-described images
  • Three detail levels: brief one-liner, balanced description, or full detailed paragraph
  • Optional focus directive to target specific aspects (text, background, faces, objects, etc.)
  • One output record per image with the caption and source URL
  • Supports bulk processing — pass up to 50 image URLs and get all captions in a single run
  • Export to JSON or CSV directly from the Apify console

Use cases

  • E-commerce SEO — generate alt text for thousands of product images automatically
  • Accessibility compliance — add descriptive alt text to images on websites and apps
  • Content moderation — understand what's in user-uploaded images before publishing
  • Dataset labeling — annotate image datasets for machine learning pipelines
  • Digital asset management — auto-tag and describe photos in large media libraries
  • Social media monitoring — caption scraped images to make them searchable by content

Examples

ImageDetail LevelCaption
Nike sneaker on red backgroundHighA vibrant red Nike sneaker takes center stage in this striking advertisement, set against a bold red background that creates a visually cohesive and eye-catching composition. The shoe is positioned at an angle, giving the impression of motion and energy. The sneaker features a white Nike swoosh, darker red laces, and "Nike Free" branding on the white sole. The lighting is bright and even, highlighting the shoe's textures and details.
Nike sneaker on red backgroundMediumA vibrant red Nike sneaker is displayed against a matching red background. The shoe features a white Nike swoosh and "Nike Free" branding on the sole. The laces are a darker shade of red, complementing the overall design.
Nike sneaker on red backgroundLowA red Nike sneaker with white accents.
Food flatlay with three dishesHighA top-down view of a rustic wooden table with three round bowls arranged in a triangular formation. The central bowl features slices of medium-rare steak garnished with fresh green leaves and a red chili pepper. The left bowl holds crispy fried fish topped with a creamy sauce and herbs. The right bowl contains a meat dish garnished with thinly sliced red onions and nuts. Scattered around the bowls are whole chili peppers, cashews, and a small bowl of brown dipping sauce.
Food flatlay with three dishesMediumThree bowls of food are arranged on a gray wooden table, creating a rustic dining scene. The central bowl contains sliced steak, while the left bowl holds fried fish topped with sauce and herbs. The right bowl features a meat dish garnished with onions and nuts.
Food flatlay with three dishesLowThree bowls of food on a wooden table with garnishes.

How to use

  1. Paste one or more image URLs into the Images field (or upload files directly)
  2. Choose a Detail Level — High gives the most descriptive output (recommended for SEO and accessibility)
  3. Optionally add a Focus hint to direct the model's attention (e.g. "describe only the text visible")
  4. Click Run — captions appear in the Dataset tab when complete
  5. Export results as JSON or CSV, or connect to downstream actors via the Apify API

Output format

Each dataset record:

{
"inputImageUrl": "https://example.com/product.jpg",
"caption": "A white ceramic coffee mug sitting on a wooden table next to an open laptop. The mug has a minimalist logo on the front and steam rising from the top, suggesting the coffee is hot.",
"detailLevel": "high",
"status": "success",
"error": null
}

Input options

FieldTypeDescription
ImagesURL listOne or more http/https image URLs or base64 data URIs
Upload ImagesFile uploadUpload images directly from your computer
Detail LevelSelectLow (one-liner), Medium (balanced), High (detailed paragraph) — default: High
FocusTextOptional directive to focus the caption on a specific aspect of the image

Limits

  • Maximum 50 images per run
  • Each image must be a publicly accessible URL or a base64 data URI
  • Processing time is typically 5–15 seconds per image