Wikipedia Scraper
Pricing
Pay per event
Wikipedia Scraper
Scrape Wikipedia articles. Search by keyword and get titles, summaries, URLs, word counts, and last edit dates.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract Wikipedia articles by keyword search. Get titles, full summaries, URLs, word counts, thumbnails, and last edit dates from any of Wikipedia's 300+ language editions.
What does Wikipedia Scraper do?
Wikipedia Scraper searches Wikipedia using the official MediaWiki API and extracts structured data from matching articles. For each search keyword, it returns article metadata including the introductory extract (summary), word count, page size, thumbnail image, and direct URL.
The scraper uses Wikipedia's built-in search API, so results match what you'd find searching on Wikipedia itself — ranked by relevance with support for all Wikipedia languages.
Why scrape Wikipedia?
Wikipedia is the world's largest free encyclopedia with over 60 million articles across 300+ languages. It's a primary source for:
- Knowledge base construction — build reference datasets for AI training, chatbots, or research databases
- Content enrichment — add Wikipedia summaries to product catalogs, educational platforms, or content management systems
- Research and analysis — analyze article coverage, word counts, and edit patterns across topics
- Multilingual data — gather information in any language Wikipedia supports
- SEO and content strategy — understand topic coverage and find content gaps
How much does it cost to scrape Wikipedia?
Wikipedia Scraper uses pay-per-event pricing:
| Event | Price |
|---|---|
| Run started | $0.001 |
| Article extracted | $0.001 per article |
Example costs:
- 10 articles on "machine learning": ~$0.011
- 100 articles on "history": ~$0.101
- 500 articles across 5 keywords: ~$0.506
Platform costs are minimal — a typical run uses under $0.001 in compute. Wikipedia's API is fast and does not require proxies.
Input parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
searchQueries | string[] | Keywords to search on Wikipedia. Each keyword runs a separate search. | Required |
language | string | Wikipedia language code (e.g., en, de, fr, es, ja, zh) | "en" |
maxResultsPerSearch | integer | Maximum articles per keyword (1–500) | 50 |
Input example
{"searchQueries": ["artificial intelligence", "quantum computing"],"language": "en","maxResultsPerSearch": 20}
Output example
Each article is returned as a JSON object:
{"pageId": 1164,"title": "Artificial intelligence","extract": "Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...","url": "https://en.wikipedia.org/wiki/Artificial_intelligence","wordCount": 26473,"size": 266568,"lastEdited": "2026-03-02T11:28:15Z","thumbnail": "https://upload.wikimedia.org/wikipedia/commons/thumb/...","scrapedAt": "2026-03-03T04:08:23.785Z"}
Output fields
| Field | Type | Description |
|---|---|---|
pageId | number | Wikipedia internal page identifier |
title | string | Article title |
extract | string | Introductory summary (plain text, no HTML) |
url | string | Direct link to the Wikipedia article |
wordCount | number | Total word count of the article |
size | number | Article size in bytes |
lastEdited | string | ISO timestamp of the last edit |
thumbnail | string | URL to article thumbnail image (if available) |
scrapedAt | string | ISO timestamp when the data was extracted |
Supported languages
Wikipedia Scraper supports all 300+ Wikipedia language editions. Use the standard language code:
| Code | Language | Articles |
|---|---|---|
en | English | 6.9M+ |
de | German | 2.9M+ |
fr | French | 2.6M+ |
es | Spanish | 2.0M+ |
ja | Japanese | 1.4M+ |
ru | Russian | 1.9M+ |
zh | Chinese | 1.4M+ |
pt | Portuguese | 1.1M+ |
it | Italian | 1.8M+ |
ar | Arabic | 1.2M+ |
Any valid Wikipedia language code works — see the full list.
How to scrape Wikipedia with the API
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("automation-lab/wikipedia-scraper").call(run_input={"searchQueries": ["climate change", "renewable energy"],"language": "en","maxResultsPerSearch": 20,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item['title']} — {item['wordCount']} words")print(f" {item['url']}")print(f" {item['extract'][:200]}...")
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/wikipedia-scraper').call({searchQueries: ['climate change', 'renewable energy'],language: 'en',maxResultsPerSearch: 20,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => {console.log(`${item.title} — ${item.wordCount} words`);console.log(` ${item.url}`);});
REST API
curl -X POST "https://api.apify.com/v2/acts/automation-lab/wikipedia-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"searchQueries": ["artificial intelligence"],"language": "en","maxResultsPerSearch": 10}'
Integrations
Connect Wikipedia Scraper to hundreds of apps using built-in integrations:
- Google Sheets — export article data to spreadsheets
- Slack / Microsoft Teams — get notifications when scraping completes
- Zapier / Make — trigger workflows with scraped Wikipedia data
- Amazon S3 / Google Cloud Storage — store large datasets in cloud storage
- Webhook — send results to your own API endpoint
Tips and best practices
- Use specific keywords — more specific searches return more relevant results. "Quantum entanglement" is better than "quantum".
- Batch keywords efficiently — combine related keywords in one run to save on startup costs.
- Language parameter — set the language code to search non-English Wikipedias. Results, summaries, and URLs will all be in the selected language.
- Word count filtering — use the
wordCountfield to filter out stub articles (typically < 500 words). - Rate limits — Wikipedia's API is generous but has rate limits. The scraper handles pagination and batching automatically.
- Extracts are summaries — the
extractfield contains only the article's introduction, not the full text. For full articles, follow theurllink. - Max 500 results per keyword — this is a Wikipedia API limit. For broader coverage, use multiple related keywords.
FAQ
Q: Does this scraper get the full article text?
A: The extract field contains the article's introductory section in plain text. For complete article content, you can use the url to access the full page.
Q: How fast is it? A: Very fast. Wikipedia's API is highly optimized. A typical run extracting 50 articles completes in under 5 seconds.
Q: Does it need proxies? A: No. Wikipedia's API is open and does not block automated requests. The scraper identifies itself with a proper User-Agent header.
Q: Can I search in multiple languages at once? A: Each run uses one language. To search multiple languages, run the scraper once per language.