OpenVerse Image Scraper
Pricing
Pay per usage
OpenVerse Image Scraper
Bulk-download Creative Commons images from OpenVerse instantly. Extract photo URLs, metadata, licenses & attribution automatically. Ideal for content creation, web design, research projects & AI training datasets. Fully structured, legally compliant sourcing.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Shahid Irfan
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
Extract comprehensive data from the OpenVerse database with ease. Collect openly licensed and public domain images at scale, perfectly suited for building AI training datasets, sourcing stock photos, and comprehensive research.
Features
- Keyword Search — Find specific images matching your query.
- License Filtering — Narrow down results to specific Creative Commons licenses or public domain status.
- Source Selection — Target specific providers like Flickr, Wikimedia, Rawpixel, and more.
- High Performance — Extract data swiftly without limitations using direct API access.
- Clean Data — Automatically removes empty or null values from the extracted datasets to keep your data structured and pristine.
Use Cases
Machine Learning and AI Training
Build comprehensive datasets of openly licensed images for training computer vision models without worrying about copyright infringement.
Content Creation & Curation
Source public domain and CC-licensed stock photos for websites, blogs, and marketing materials directly from an enormous catalog.
Academic Research
Gather metadata about image attribution, creator statistics, and license distribution across various providers for large-scale analysis.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
keyword | String | Yes | — | Image search keyword(s) |
license_type | String | No | — | Comma-separated list of licenses (e.g., by, cc0) |
source | String | No | — | Comma-separated list of sources (e.g., flickr) |
sort | String | No | — | Sort order (e.g., relevance or newest) |
results_wanted | Integer | No | 20 | Maximum results to collect |
Output Data
Each extracted item in the dataset contains comprehensive image metadata:
| Field | Type | Description |
|---|---|---|
id | String | Unique OpenVerse identifier |
title | String | Image title |
url | String | Direct URL to the high-resolution image |
foreign_landing_url | String | URL to the image on the original provider's website |
creator | String | Name of the image creator |
license | String | Creative Commons license type |
provider | String | Name of the provider (e.g., flickr) |
attribution | String | Complete attribution string for easy use |
Usage Examples
Basic Image Search
Extract 50 nature images:
{"keyword": "nature","results_wanted": 50}
Specific License Extraction
Extract public domain (CC0) technology images from Flickr:
{"keyword": "technology","license_type": "cc0","source": "flickr","results_wanted": 100}
Sample Output
{"id": "260c0ca9-35a4-41a4-b1e0-e13339a2b31d","title": "Computer Test","foreign_landing_url": "https://www.flickr.com/photos/74362028@N00/2099250160","url": "https://live.staticflickr.com/2287/2099250160_e1ceb65c97_b.jpg","creator": "flatiron32","creator_url": "https://www.flickr.com/photos/74362028@N00","license": "by-nc","license_version": "2.0","provider": "flickr","source": "flickr","attribution": "\"Computer Test\" by flatiron32 is licensed under CC BY-NC 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/2.0/.","mature": false,"height": 768,"width": 1024}
Tips for Best Results
Optimize Collection Size
- Start with a small
results_wanted(like 20) to preview the output data structure. - Increase the limit once you confirm the parameters accurately target your needs.
Precise Filtering
- Use specific license codes like
pdm(Public Domain Mark),cc0,by,by-sa. - Providing a specific
source(likewikimediaorrawpixel) yields more homogeneous data.
Integrations
Connect your extracted image data with:
- Google Sheets — Export metadata for quick review
- Airtable — Build searchable image databases
- Make — Create automated data enrichment workflows
Export Formats
Download your extracted data in multiple formats:
- JSON — For developers and APIs
- CSV — For spreadsheet analysis
Frequently Asked Questions
Can I scrape multiple pages?
Yes, the actor automatically handles pagination to reach your desired result count.
Are these images free to use?
OpenVerse aggregates openly licensed and public domain works, but you must respect the provided license and attribution requirements for each image.
What if data is missing?
Some fields (like category or filesize) might be missing if the original provider doesn't supply them. The actor automatically removes null fields to keep your dataset clean.
Support
For issues or feature requests, contact support through the Apify Console.
Resources
Legal Notice
This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with website terms of service, honoring the specified image licenses, and adhering to applicable laws. Use data responsibly.