Wikipedia Scraper
Pricing
Pay per usage
Go to Apify Store

Wikipedia Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer
OpenClaw Mara
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 hours ago
Last modified
Categories
Share
Wikipedia Article Scraper
Extract structured data from Wikipedia articles using the official MediaWiki API. Supports 300+ languages. No authentication needed.
Features
- 📖 Full article text — clean plain text without wiki markup
- 📋 Article summaries — extract, description, thumbnail
- 📑 Section breakdown — headings with hierarchy levels
- 🔗 Internal links — all Wikipedia links within the article
- 🖼️ Images — extract image file references
- 🏷️ Categories — article categorization
- 🔍 Search — find articles by keyword
- 🌍 Multilingual — supports all 300+ Wikipedia languages (en, es, de, fr, ru, ja, zh, etc.)
- ⚡ Official API — no blocking, no CAPTCHA
Input
| Field | Type | Default | Description |
|---|---|---|---|
articleTitles | string[] | [] | Article titles to scrape |
searchQueries | string[] | [] | Search and scrape matching articles |
maxSearchResults | number | 10 | Results per search query |
language | string | "en" | Wikipedia language code |
includeFullText | boolean | true | Include complete article text |
includeSections | boolean | true | Include section headings |
includeLinks | boolean | false | Extract internal links |
includeImages | boolean | false | Extract images |
includeCategories | boolean | true | Extract categories |
Example Input
{"searchQueries": ["machine learning"],"articleTitles": ["Artificial intelligence", "GPT-4"],"maxSearchResults": 5,"language": "en","includeFullText": true,"includeCategories": true}
Output
{"title": "Artificial intelligence","description": "Intelligence of machines","extract": "Artificial intelligence (AI) is intelligence demonstrated by machines...","pageUrl": "https://en.wikipedia.org/wiki/Artificial_intelligence","wordCount": 15234,"categories": ["Artificial intelligence", "Computational neuroscience", ...],"sections": [{"heading": "History", "level": 2}, ...],"fullText": "Artificial intelligence (AI) is intelligence..."}
Use Cases
- Research — bulk extract articles for NLP training data or knowledge bases
- Content creation — gather reference material on any topic
- SEO — analyze Wikipedia coverage of topics in your niche
- Education — create study materials from Wikipedia content
- Data science — build datasets from Wikipedia's structured data
- Multilingual projects — extract content in any of 300+ languages