Wikipedia Scraper - Article Content Extractor
Pricing
from $10.00 / 1,000 results
Go to Apify Store

Wikipedia Scraper - Article Content Extractor
Scrape Wikipedia articles. Search by topic and extract full structured content: summaries, sections, infobox data, categories, references, images, and edit history for any article.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer
lulz bot
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Wikipedia Scraper
Extract structured content from Wikipedia articles. Search by topic or provide direct article URLs to get summaries, full text, infobox data, categories, references, images, and metadata.
Features
- Search by topic — find the most relevant Wikipedia articles for any query
- Direct URL scraping — provide specific article URLs for targeted extraction
- Structured content — articles are parsed into sections with headers
- Infobox extraction — key-value data from article infoboxes (e.g., programming language details, country stats)
- Multi-language — supports all Wikipedia language editions (en, es, fr, de, ja, etc.)
- References — extracted reference list from each article
- Categories & images — article classification and image file names
Input
| Field | Type | Description |
|---|---|---|
searchQueries | string[] | Topics to search (e.g., "artificial intelligence", "JavaScript") |
articleUrls | string[] | Direct Wikipedia article URLs |
maxArticles | number | Max articles per search query (default: 5) |
maxResults | number | Max total results (default: 25) |
language | string | Wikipedia language code (default: "en") |
extractSections | boolean | Extract full section content (default: true) |
Output
Each article includes:
{"title": "JavaScript","pageId": 9845,"summary": "JavaScript is a programming language...","content": "Full article text...","sections": [{ "title": "Introduction", "content": "..." },{ "title": "History", "content": "..." }],"infobox": {"Paradigm": "Multi-paradigm","Designed by": "Brendan Eich","First appeared": "December 4, 1995"},"categories": ["Programming languages", "Web development"],"images": ["File:JavaScript_code.png"],"thumbnail": "https://upload.wikimedia.org/...","references": ["..."],"lastEdited": "2026-04-20T12:00:00Z","url": "https://en.wikipedia.org/wiki/JavaScript","language": "en"}
Use Cases
- Research — gather structured data on any topic
- Knowledge bases — build datasets from Wikipedia's encyclopedia
- NLP training data — extract clean text with metadata
- Fact-checking — cross-reference claims with Wikipedia sources
- Content enrichment — add Wikipedia context to your applications
Pricing
This actor uses pay-per-event pricing at $0.005 per article scraped.
Notes
- Uses the official Wikipedia API (free, no auth required)
- HTML is fetched for infobox and reference extraction
- Rate-limited to be respectful of Wikipedia servers
- Article content is capped at 50,000 characters per article