Wikipedia Article Scraper
Pricing
from $0.50 / 1,000 results
Wikipedia Article Scraper
Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.
Pricing
from $0.50 / 1,000 results
Rating
5.0
(10)
Developer
Crawler Bros
Actor stats
11
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
Extract structured data from Wikipedia articles using the official MediaWiki API. Get article summaries, categories, images, metadata, and descriptions. Supports 300+ languages.
Features
- Extract article titles, summaries, and descriptions
- Get categories, images, and thumbnails
- Support for 300+ Wikipedia languages
- Two modes: scrape by URL or search by keyword
- Uses official Wikipedia REST + MediaWiki APIs
- No proxy or cookies required
- Lightweight HTTP-only (no browser)
- Proper rate limiting and User-Agent identification
Input
| Field | Type | Default | Description |
|---|---|---|---|
articleUrls | Array | — | Wikipedia article URLs to scrape |
searchQueries | Array | — | Search terms to find articles |
maxArticlesPerQuery | Integer | 5 | Max articles per search query (1-50) |
language | String | "en" | Wikipedia language code |
Example: Scrape by URL
{"articleUrls": ["https://en.wikipedia.org/wiki/Python_(programming_language)","https://en.wikipedia.org/wiki/Artificial_intelligence"]}
Example: Search by Keyword
{"searchQueries": ["machine learning", "quantum computing"],"maxArticlesPerQuery": 3,"language": "en"}
Output
| Field | Type | Description |
|---|---|---|
title | String | Article title |
url | String | Full Wikipedia URL |
summary | String | Lead section extract (first few paragraphs) |
description | String | Wikidata short description |
categories | Array | Article categories |
thumbnail | Object | Thumbnail image with source, width, height |
images | Array | Image filenames from the article |
lastModified | String | Last edit timestamp |
language | String | Language code |
pageId | Integer | Wikipedia page ID |
scrapedAt | String | ISO timestamp when scraped |
Use Cases
- Research — collect structured article data for academic or business research
- Content enrichment — augment your database with Wikipedia descriptions and metadata
- Knowledge graphs — build knowledge bases from Wikipedia's categorized data
- Education — gather article summaries for educational content
- SEO — analyze Wikipedia's coverage of topics in your niche
- Data science — use Wikipedia data for NLP training and analysis
FAQ
Is a proxy required?
No. Wikipedia's API is freely accessible. No proxy, cookies, or authentication needed.
What languages are supported?
All 300+ Wikipedia language editions. Set the language parameter to any valid code: en, fr, de, es, ja, zh, ru, pt, it, ar, ko, nl, pl, etc.
Are there rate limits?
Wikipedia asks for polite access with proper User-Agent headers. The scraper includes built-in delays (0.3-0.5s between requests) to respect Wikipedia's guidelines.
Can I scrape article content (full text)?
This scraper extracts the lead section summary. For full article text, the summary field contains a clean text extract of the opening paragraphs which is suitable for most use cases.