Openverse Open-License Media Scraper
Pricing
from $13.00 / 1,000 result items
Openverse Open-License Media Scraper
Search 800M+ openly licensed images, audio clips and graphics across Flickr, Wikimedia, Europeana, Smithsonian, NASA and 50+ CC and public-domain providers. Returns title, creator, license, attribution, source URL, file size, dimensions, tags and direct media URL. Filter by license or source.
Pricing
from $13.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share

🎨 Openverse Media Scraper
🚀 Search 800M+ openly licensed images, audio, and graphics across 50+ providers.
🕒 Last updated: 2026-05-06 · 📊 23 fields per record · 800M+ media records · CC and public-domain providers (Flickr, Wikimedia, Smithsonian, NASA, Europeana)
The Openverse Media Scraper searches WordPress.org's Openverse index of openly licensed media and returns structured records for images, audio clips, illustrations, and graphics. Every result is licensed under Creative Commons or in the public domain, with full attribution metadata.
The catalog aggregates 800M+ items across 50+ providers (Flickr, Wikimedia Commons, Europeana, Smithsonian, NASA, Bio Diversity Library, Rawpixel). Filters run server-side, so a single run can isolate CC0 sunsets, Smithsonian sketches, or NASA imagery only.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Content creators, designers, educators, marketing teams, journalists, app developers, AI training pipelines | Content libraries, blog illustrations, social media assets, AI training datasets, educational materials |
📋 What the Openverse Media Scraper does
Five filtering workflows in a single run:
- 🔍 Keyword search. Match titles, descriptions, tags, and creator names across the catalog.
- 🏷️ License filter. Restrict by CC license (CC0, CC-BY, CC-BY-SA) or public domain.
- 📁 Source filter. Restrict to one provider.
- 📐 Aspect ratio. Tall, wide, or square (images only).
- 🎵 Media type toggle. Switch between images and audio.
💡 Why it matters: clean, server-side filtering removes the parser-and-pagination work from your team and keeps your dataset fresh on every run.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan up to 1,000,000. |
query | string | "sunset" | Free-text keyword search. |
mediaType | string | "images" | `images` or `audio`. |
license | string | "" | License filter (cc0, by, by-sa, by-nc). Empty = any. |
source | string | "" | Provider filter. Empty = all. |
aspectRatio | string | "" | tall, wide, square (images only). |
Example: 100 CC0 sunset images.
{"maxItems": 100,"query": "sunset","mediaType": "images","license": "cc0"}
Example: 500 NASA-sourced images.
{"maxItems": 500,"mediaType": "images","source": "nasa"}
📊 Output
Each record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🖼️ thumbnailUrl | string | "https://api.openverse.org/v1/images/.../thumb/" |
🆔 id | string | "1e97a259-..." |
📛 title | string | null |
👤 creator | string | null |
🌐 url | string | "https://live.staticflickr.com/.../b.jpg" |
🌐 sourceUrl | string | "https://www.flickr.com/photos/.../4994679" |
⚖️ license | string | "cc-by-nc-sa" |
⚖️ licenseVersion | string | null |
📁 source | string | "flickr" |
📐 width | number | null |
📐 height | number | null |
🎵 duration | number | null |
🏷️ tags | array | ["sunset","nature"] |
📋 attribution | string | "Sunset by X (CC BY-NC-SA 2.0)" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| ⚖️ | Verified open licenses. Every record carries explicit license + attribution; no copyright guessing. |
| 🌐 | 50+ providers in one index. Flickr, Wikimedia, Europeana, Smithsonian, NASA in a single search. |
| 🎵 | Audio + images. Switch media type with one input flag. |
| ⚡ | Fast. 100 records in under 30 seconds. |
| 🔄 | Always fresh. Each run hits the live Openverse index. |
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ This Actor | $5 free credit | 800M+ items | Live per run | license, source, type, aspect | ⚡ 2 min |
| Unsplash/Pexels APIs | Free tier | Smaller curated | Live | Limited | ⏳ Hours |
| Manual provider scraping | Free | Per-provider | Live | DIY | 🐢 Days |
| Stock photo libraries | $30+/month | Curated | Live | Yes | 🐢 Account setup |
Pick this Actor when you want broad coverage, server-side filtering, and no pipeline maintenance.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the Openverse Media Scraper page on the Apify Store.
- 🎯 Set input. Pick your filters and
maxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating Openverse Media Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Provide a query, license, source, or aspect-ratio filter. The Actor queries the Openverse index and emits one record per media item.
⚖️ Is everything free to use commercially?
Most records are CC0 or CC-BY which permit commercial use with attribution. Always verify the specific license.
📊 How many fields per record?
23, including title, creator, license, source URL, dimensions, tags, attribution, and direct media URL.
🎵 Does it include audio?
Yes. Set mediaType to audio to search music, sound effects, and spoken-word recordings.
🔁 Can I schedule recurring runs?
Yes. Use Apify Schedules for content-pipeline refreshes.
🌐 Which providers are covered?
50+, including Flickr, Wikimedia Commons, Europeana, Smithsonian, NASA, Rawpixel, Bio Diversity Library.
🔄 How fresh is the index?
Openverse re-indexes providers continuously. Each run hits the latest snapshot.
💳 Do I need a paid Apify plan?
No. The free plan covers preview runs. A paid plan unlocks larger downloads and scheduling.
🆘 What if a run fails?
Apify retries transient errors. Inspect logs in the Runs tab; partial datasets are preserved.
📐 Can I filter by image dimensions?
Aspect ratio (tall/wide/square) is supported. Exact-dimension filtering happens client-side after download.
🔌 Integrate with any app
Openverse Media Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe data into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes.
🔗 Recommended Actors
- 📚 Project Gutenberg Books - 75,000+ free public-domain books
- 📖 Open Library Books - 30M+ books and editions
- 🎨 Met Museum Scraper - Metropolitan Museum public-domain artworks
- 🌐 Wikidata Entity Search - 100M+ open knowledge-graph entities
- 🎬 TVMaze TV Shows - TV show metadata and episodes
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by WordPress.org, Openverse, or any of the upstream content providers. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.