OpenVerse Image Scraper avatar

OpenVerse Image Scraper

Pricing

Pay per usage

Go to Apify Store
OpenVerse Image Scraper

OpenVerse Image Scraper

Bulk-download Creative Commons images from OpenVerse instantly. Extract photo URLs, metadata, licenses & attribution automatically. Ideal for content creation, web design, research projects & AI training datasets. Fully structured, legally compliant sourcing.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Extract comprehensive data from the OpenVerse database with ease. Collect openly licensed and public domain images at scale, perfectly suited for building AI training datasets, sourcing stock photos, and comprehensive research.

Features

  • Keyword Search — Find specific images matching your query.
  • License Filtering — Narrow down results to specific Creative Commons licenses or public domain status.
  • Source Selection — Target specific providers like Flickr, Wikimedia, Rawpixel, and more.
  • High Performance — Extract data swiftly without limitations using direct API access.
  • Clean Data — Automatically removes empty or null values from the extracted datasets to keep your data structured and pristine.

Use Cases

Machine Learning and AI Training

Build comprehensive datasets of openly licensed images for training computer vision models without worrying about copyright infringement.

Content Creation & Curation

Source public domain and CC-licensed stock photos for websites, blogs, and marketing materials directly from an enormous catalog.

Academic Research

Gather metadata about image attribution, creator statistics, and license distribution across various providers for large-scale analysis.


Input Parameters

ParameterTypeRequiredDefaultDescription
keywordStringYesImage search keyword(s)
license_typeStringNoComma-separated list of licenses (e.g., by, cc0)
sourceStringNoComma-separated list of sources (e.g., flickr)
sortStringNoSort order (e.g., relevance or newest)
results_wantedIntegerNo20Maximum results to collect

Output Data

Each extracted item in the dataset contains comprehensive image metadata:

FieldTypeDescription
idStringUnique OpenVerse identifier
titleStringImage title
urlStringDirect URL to the high-resolution image
foreign_landing_urlStringURL to the image on the original provider's website
creatorStringName of the image creator
licenseStringCreative Commons license type
providerStringName of the provider (e.g., flickr)
attributionStringComplete attribution string for easy use

Usage Examples

Extract 50 nature images:

{
"keyword": "nature",
"results_wanted": 50
}

Specific License Extraction

Extract public domain (CC0) technology images from Flickr:

{
"keyword": "technology",
"license_type": "cc0",
"source": "flickr",
"results_wanted": 100
}

Sample Output

{
"id": "260c0ca9-35a4-41a4-b1e0-e13339a2b31d",
"title": "Computer Test",
"foreign_landing_url": "https://www.flickr.com/photos/74362028@N00/2099250160",
"url": "https://live.staticflickr.com/2287/2099250160_e1ceb65c97_b.jpg",
"creator": "flatiron32",
"creator_url": "https://www.flickr.com/photos/74362028@N00",
"license": "by-nc",
"license_version": "2.0",
"provider": "flickr",
"source": "flickr",
"attribution": "\"Computer Test\" by flatiron32 is licensed under CC BY-NC 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/2.0/.",
"mature": false,
"height": 768,
"width": 1024
}

Tips for Best Results

Optimize Collection Size

  • Start with a small results_wanted (like 20) to preview the output data structure.
  • Increase the limit once you confirm the parameters accurately target your needs.

Precise Filtering

  • Use specific license codes like pdm (Public Domain Mark), cc0, by, by-sa.
  • Providing a specific source (like wikimedia or rawpixel) yields more homogeneous data.

Integrations

Connect your extracted image data with:

  • Google Sheets — Export metadata for quick review
  • Airtable — Build searchable image databases
  • Make — Create automated data enrichment workflows

Export Formats

Download your extracted data in multiple formats:

  • JSON — For developers and APIs
  • CSV — For spreadsheet analysis

Frequently Asked Questions

Can I scrape multiple pages?

Yes, the actor automatically handles pagination to reach your desired result count.

Are these images free to use?

OpenVerse aggregates openly licensed and public domain works, but you must respect the provided license and attribution requirements for each image.

What if data is missing?

Some fields (like category or filesize) might be missing if the original provider doesn't supply them. The actor automatically removes null fields to keep your dataset clean.


Support

For issues or feature requests, contact support through the Apify Console.

Resources


This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with website terms of service, honoring the specified image licenses, and adhering to applicable laws. Use data responsibly.