Wikipedia Article Scraper
Pricing
from $0.50 / 1,000 results
Wikipedia Article Scraper
Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.
Pricing
from $0.50 / 1,000 results
Rating
5.0
(14)
Developer
Crawler Bros
Actor stats
15
Bookmarked
3
Total users
2
Monthly active users
13 days ago
Last modified
Categories
Share
Extract structured data from Wikipedia articles using the official MediaWiki API. Get article summaries, categories, images, metadata, and descriptions. Supports 300+ languages.
Features
- Extract article titles, summaries, and descriptions
- Get categories, images, and thumbnails
- Support for 300+ Wikipedia languages
- Two modes: scrape by URL or search by keyword
- Uses official Wikipedia REST + MediaWiki APIs
- No proxy or cookies required
- Lightweight HTTP-only (no browser)
- Proper rate limiting and User-Agent identification
Input
| Field | Type | Default | Description |
|---|---|---|---|
articleUrls | Array | — | Wikipedia article URLs to scrape |
searchQueries | Array | — | Search terms to find articles |
maxArticlesPerQuery | Integer | 5 | Max articles per search query (1-50) |
language | String | "en" | Wikipedia language code |
Example: Scrape by URL
{"articleUrls": ["https://en.wikipedia.org/wiki/Python_(programming_language)","https://en.wikipedia.org/wiki/Artificial_intelligence"]}
Example: Search by Keyword
{"searchQueries": ["machine learning", "quantum computing"],"maxArticlesPerQuery": 3,"language": "en"}
Output
| Field | Type | Description |
|---|---|---|
title | String | Article title |
url | String | Full Wikipedia URL |
summary | String | Lead section extract (first few paragraphs) |
description | String | Wikidata short description |
categories | Array | Article categories |
thumbnail | Object | Thumbnail image with source, width, height |
images | Array | Image filenames from the article |
lastModified | String | Last edit timestamp |
language | String | Language code |
pageId | Integer | Wikipedia page ID |
scrapedAt | String | ISO timestamp when scraped |
Use Cases
- Research — collect structured article data for academic or business research
- Content enrichment — augment your database with Wikipedia descriptions and metadata
- Knowledge graphs — build knowledge bases from Wikipedia's categorized data
- Education — gather article summaries for educational content
- SEO — analyze Wikipedia's coverage of topics in your niche
- Data science — use Wikipedia data for NLP training and analysis
FAQ
Is a proxy required?
No. Wikipedia's API is freely accessible. No proxy, cookies, or authentication needed.
What languages are supported?
All 300+ Wikipedia language editions. Set the language parameter to any valid code: en, fr, de, es, ja, zh, ru, pt, it, ar, ko, nl, pl, etc.
Are there rate limits?
Wikipedia asks for polite access with proper User-Agent headers. The scraper includes built-in delays (0.3-0.5s between requests) to respect Wikipedia's guidelines.
Can I scrape article content (full text)?
This scraper extracts the lead section summary. For full article text, the summary field contains a clean text extract of the opening paragraphs which is suitable for most use cases.
