Openverse Open-License Media Scraper avatar

Openverse Open-License Media Scraper

Pricing

from $13.00 / 1,000 result items

Go to Apify Store
Openverse Open-License Media Scraper

Openverse Open-License Media Scraper

Search 800M+ openly licensed images, audio clips and graphics across Flickr, Wikimedia, Europeana, Smithsonian, NASA and 50+ CC and public-domain providers. Returns title, creator, license, attribution, source URL, file size, dimensions, tags and direct media URL. Filter by license or source.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

🎨 Openverse Media Scraper

🚀 Search 800M+ openly licensed images, audio, and graphics across 50+ providers.

🕒 Last updated: 2026-05-06 · 📊 23 fields per record · 800M+ media records · CC and public-domain providers (Flickr, Wikimedia, Smithsonian, NASA, Europeana)

The Openverse Media Scraper searches WordPress.org's Openverse index of openly licensed media and returns structured records for images, audio clips, illustrations, and graphics. Every result is licensed under Creative Commons or in the public domain, with full attribution metadata.

The catalog aggregates 800M+ items across 50+ providers (Flickr, Wikimedia Commons, Europeana, Smithsonian, NASA, Bio Diversity Library, Rawpixel). Filters run server-side, so a single run can isolate CC0 sunsets, Smithsonian sketches, or NASA imagery only.

🎯 Target Audience💡 Primary Use Cases
Content creators, designers, educators, marketing teams, journalists, app developers, AI training pipelinesContent libraries, blog illustrations, social media assets, AI training datasets, educational materials

📋 What the Openverse Media Scraper does

Five filtering workflows in a single run:

  • 🔍 Keyword search. Match titles, descriptions, tags, and creator names across the catalog.
  • 🏷️ License filter. Restrict by CC license (CC0, CC-BY, CC-BY-SA) or public domain.
  • 📁 Source filter. Restrict to one provider.
  • 📐 Aspect ratio. Tall, wide, or square (images only).
  • 🎵 Media type toggle. Switch between images and audio.

💡 Why it matters: clean, server-side filtering removes the parser-and-pagination work from your team and keeps your dataset fresh on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan up to 1,000,000.
querystring"sunset"Free-text keyword search.
mediaTypestring"images"`images` or `audio`.
licensestring""License filter (cc0, by, by-sa, by-nc). Empty = any.
sourcestring""Provider filter. Empty = all.
aspectRatiostring""tall, wide, square (images only).

Example: 100 CC0 sunset images.

{
"maxItems": 100,
"query": "sunset",
"mediaType": "images",
"license": "cc0"
}

Example: 500 NASA-sourced images.

{
"maxItems": 500,
"mediaType": "images",
"source": "nasa"
}

📊 Output

Each record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🖼️ thumbnailUrlstring"https://api.openverse.org/v1/images/.../thumb/"
🆔 idstring"1e97a259-..."
📛 titlestringnull
👤 creatorstringnull
🌐 urlstring"https://live.staticflickr.com/.../b.jpg"
🌐 sourceUrlstring"https://www.flickr.com/photos/.../4994679"
⚖️ licensestring"cc-by-nc-sa"
⚖️ licenseVersionstringnull
📁 sourcestring"flickr"
📐 widthnumbernull
📐 heightnumbernull
🎵 durationnumbernull
🏷️ tagsarray["sunset","nature"]
📋 attributionstring"Sunset by X (CC BY-NC-SA 2.0)"

📦 Sample records


✨ Why choose this Actor

Capability
⚖️Verified open licenses. Every record carries explicit license + attribution; no copyright guessing.
🌐50+ providers in one index. Flickr, Wikimedia, Europeana, Smithsonian, NASA in a single search.
🎵Audio + images. Switch media type with one input flag.
Fast. 100 records in under 30 seconds.
🔄Always fresh. Each run hits the live Openverse index.

📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ This Actor$5 free credit800M+ itemsLive per runlicense, source, type, aspect⚡ 2 min
Unsplash/Pexels APIsFree tierSmaller curatedLiveLimited⏳ Hours
Manual provider scrapingFreePer-providerLiveDIY🐢 Days
Stock photo libraries$30+/monthCuratedLiveYes🐢 Account setup

Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Openverse Media Scraper page on the Apify Store.
  3. 🎯 Set input. Pick your filters and maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

📰 Content & Editorial

  • Blog post imagery with proper attribution
  • Newsletter and social media graphics
  • Article hero images by topic
  • Author headshots and brand visuals

🎓 Education & Research

  • Lecture slides with verified attribution
  • Open educational resources (OER)
  • Research paper figures
  • Public-domain audio for narration

🤖 AI & ML

  • Training image classifiers with safe licenses
  • Captioning model datasets
  • Image embedding search corpora
  • Audio dataset generation

🎨 Design & Marketing

  • Mood boards and creative briefs
  • Marketing campaign assets
  • Brand collateral with clean licensing
  • Product placeholder imagery

🔌 Automating Openverse Media Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Reproducible research figures
  • Open-license media audits
  • Cultural heritage dataset construction
  • Course material with attribution

🎨 Personal and creative

  • Personal blogs and portfolios
  • Indie game and app assets
  • DIY documentation
  • Newsletter and social-media content

🤝 Non-profit and civic

  • Public service campaign visuals
  • Civic literacy materials
  • OSM and open-data illustrations
  • Journalism with documented attribution

🧪 Experimentation

  • Train captioning models on safe data
  • Prototype attribution-aware UIs
  • Build licensed-only stock libraries
  • Test moderation pipelines

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Provide a query, license, source, or aspect-ratio filter. The Actor queries the Openverse index and emits one record per media item.

⚖️ Is everything free to use commercially?

Most records are CC0 or CC-BY which permit commercial use with attribution. Always verify the specific license.

📊 How many fields per record?

23, including title, creator, license, source URL, dimensions, tags, attribution, and direct media URL.

🎵 Does it include audio?

Yes. Set mediaType to audio to search music, sound effects, and spoken-word recordings.

🔁 Can I schedule recurring runs?

Yes. Use Apify Schedules for content-pipeline refreshes.

🌐 Which providers are covered?

50+, including Flickr, Wikimedia Commons, Europeana, Smithsonian, NASA, Rawpixel, Bio Diversity Library.

🔄 How fresh is the index?

Openverse re-indexes providers continuously. Each run hits the latest snapshot.

💳 Do I need a paid Apify plan?

No. The free plan covers preview runs. A paid plan unlocks larger downloads and scheduling.

🆘 What if a run fails?

Apify retries transient errors. Inspect logs in the Runs tab; partial datasets are preserved.

📐 Can I filter by image dimensions?

Aspect ratio (tall/wide/square) is supported. Exact-dimension filtering happens client-side after download.


🔌 Integrate with any app

Openverse Media Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by WordPress.org, Openverse, or any of the upstream content providers. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.