Wikipedia Article Scraper - Search & Extract Content
Pricing
Pay per usage
Wikipedia Article Scraper - Search & Extract Content
Search and extract Wikipedia article metadata, summaries, and content via the official MediaWiki API. No scraping overhead — pure API integration with high reliability.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Pierrick McD0nald
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Wikipedia Article Scraper — Search & Extract Content
Extract Wikipedia article metadata, summaries, and content via the official MediaWiki API. This Actor searches Wikipedia by keyword and returns structured data for every matching article — no browser overhead, no scraping complexity, just clean API integration.
Use Cases
- Content Research — Gather article summaries and metadata for academic research, content marketing, or knowledge base building.
- SEO & Topic Analysis — Extract word counts, article sizes, and publication dates to analyze content depth and freshness across topics.
- Data Enrichment — Augment datasets with Wikipedia summaries, thumbnail images, and canonical URLs for entity linking and NLP pipelines.
- Multilingual Content — Search across 300+ Wikipedia language editions to build localized content collections.
Input
| Field | Type | Required | Description |
|---|---|---|---|
searchQuery | String | Yes | Search term to find Wikipedia articles (e.g., "machine learning", "quantum computing") |
maxResults | Number | No | Maximum articles to extract, 1–500 (default: 25) |
includeExtract | Boolean | No | Fetch article introduction/summary text (default: true) |
includeImages | Boolean | No | Fetch thumbnail image URLs (default: false) |
language | String | No | Wikipedia language code: en, es, fr, de, ja, etc. (default: "en") |
proxyConfiguration | Object | No | Proxy settings (optional — Wikipedia API does not require proxy) |
Output
The Actor outputs a dataset with the following fields:
{"pageId": 233488,"title": "Machine learning","url": "https://en.wikipedia.org/wiki/Machine_learning","snippet": "Machine learning (ML) is a field of study in artificial intelligence...","extract": "Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms...","wordCount": 15287,"size": 141291,"thumbnail": "https://upload.wikimedia.org/wikipedia/commons/thumb/...","timestamp": "2026-05-15T10:30:00Z","language": "en"}
Pricing
Pay per event: $0.001 per article extracted.
No minimums, no subscriptions. You only pay for the results you receive. The Wikipedia MediaWiki API is free and public, so compute costs are minimal and margins stay high.
Limitations
- Maximum 500 results per run (Wikipedia API limit)
- Article extracts are limited to the introduction/summary section
- Thumbnail images are only available when
includeImagesis enabled and the article has an image - Rate limits apply per Wikipedia language edition (handled automatically with retries)
FAQ
Q: Do I need a Wikipedia API key? A: No. This Actor uses the public MediaWiki API with no authentication required.
Q: Can I search in languages other than English?
A: Yes. Set the language field to any valid Wikipedia language code (e.g., "es" for Spanish, "ja" for Japanese).
Q: What happens if my search returns thousands of results?
A: The Actor respects the maxResults limit and paginates through the API automatically. You only pay for the number of articles actually extracted.
Changelog
- v1.0.0 — Initial release