Wikipedia Scraper avatar

Wikipedia Scraper

Pricing

Pay per event

Go to Apify Store
Wikipedia Scraper

Wikipedia Scraper

Scrape Wikipedia articles. Search by keyword and get titles, summaries, URLs, word counts, and last edit dates.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Extract Wikipedia articles by keyword search. Get titles, full summaries, URLs, word counts, thumbnails, and last edit dates from any of Wikipedia's 300+ language editions.

What does Wikipedia Scraper do?

Wikipedia Scraper searches Wikipedia using the official MediaWiki API and extracts structured data from matching articles. For each search keyword, it returns article metadata including the introductory extract (summary), word count, page size, thumbnail image, and direct URL.

The scraper uses Wikipedia's built-in search API, so results match what you'd find searching on Wikipedia itself — ranked by relevance with support for all Wikipedia languages.

Why scrape Wikipedia?

Wikipedia is the world's largest free encyclopedia with over 60 million articles across 300+ languages. It's a primary source for:

  • Knowledge base construction — build reference datasets for AI training, chatbots, or research databases
  • Content enrichment — add Wikipedia summaries to product catalogs, educational platforms, or content management systems
  • Research and analysis — analyze article coverage, word counts, and edit patterns across topics
  • Multilingual data — gather information in any language Wikipedia supports
  • SEO and content strategy — understand topic coverage and find content gaps

How much does it cost to scrape Wikipedia?

Wikipedia Scraper uses pay-per-event pricing:

EventPrice
Run started$0.001
Article extracted$0.001 per article

Example costs:

  • 10 articles on "machine learning": ~$0.011
  • 100 articles on "history": ~$0.101
  • 500 articles across 5 keywords: ~$0.506

Platform costs are minimal — a typical run uses under $0.001 in compute. Wikipedia's API is fast and does not require proxies.

Input parameters

ParameterTypeDescriptionDefault
searchQueriesstring[]Keywords to search on Wikipedia. Each keyword runs a separate search.Required
languagestringWikipedia language code (e.g., en, de, fr, es, ja, zh)"en"
maxResultsPerSearchintegerMaximum articles per keyword (1–500)50

Input example

{
"searchQueries": ["artificial intelligence", "quantum computing"],
"language": "en",
"maxResultsPerSearch": 20
}

Output example

Each article is returned as a JSON object:

{
"pageId": 1164,
"title": "Artificial intelligence",
"extract": "Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...",
"url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
"wordCount": 26473,
"size": 266568,
"lastEdited": "2026-03-02T11:28:15Z",
"thumbnail": "https://upload.wikimedia.org/wikipedia/commons/thumb/...",
"scrapedAt": "2026-03-03T04:08:23.785Z"
}

Output fields

FieldTypeDescription
pageIdnumberWikipedia internal page identifier
titlestringArticle title
extractstringIntroductory summary (plain text, no HTML)
urlstringDirect link to the Wikipedia article
wordCountnumberTotal word count of the article
sizenumberArticle size in bytes
lastEditedstringISO timestamp of the last edit
thumbnailstringURL to article thumbnail image (if available)
scrapedAtstringISO timestamp when the data was extracted

Supported languages

Wikipedia Scraper supports all 300+ Wikipedia language editions. Use the standard language code:

CodeLanguageArticles
enEnglish6.9M+
deGerman2.9M+
frFrench2.6M+
esSpanish2.0M+
jaJapanese1.4M+
ruRussian1.9M+
zhChinese1.4M+
ptPortuguese1.1M+
itItalian1.8M+
arArabic1.2M+

Any valid Wikipedia language code works — see the full list.

How to scrape Wikipedia with the API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("automation-lab/wikipedia-scraper").call(run_input={
"searchQueries": ["climate change", "renewable energy"],
"language": "en",
"maxResultsPerSearch": 20,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']}{item['wordCount']} words")
print(f" {item['url']}")
print(f" {item['extract'][:200]}...")

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/wikipedia-scraper').call({
searchQueries: ['climate change', 'renewable energy'],
language: 'en',
maxResultsPerSearch: 20,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
console.log(`${item.title}${item.wordCount} words`);
console.log(` ${item.url}`);
});

REST API

curl -X POST "https://api.apify.com/v2/acts/automation-lab/wikipedia-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"searchQueries": ["artificial intelligence"],
"language": "en",
"maxResultsPerSearch": 10
}'

Integrations

Connect Wikipedia Scraper to hundreds of apps using built-in integrations:

  • Google Sheets — export article data to spreadsheets
  • Slack / Microsoft Teams — get notifications when scraping completes
  • Zapier / Make — trigger workflows with scraped Wikipedia data
  • Amazon S3 / Google Cloud Storage — store large datasets in cloud storage
  • Webhook — send results to your own API endpoint

Tips and best practices

  1. Use specific keywords — more specific searches return more relevant results. "Quantum entanglement" is better than "quantum".
  2. Batch keywords efficiently — combine related keywords in one run to save on startup costs.
  3. Language parameter — set the language code to search non-English Wikipedias. Results, summaries, and URLs will all be in the selected language.
  4. Word count filtering — use the wordCount field to filter out stub articles (typically < 500 words).
  5. Rate limits — Wikipedia's API is generous but has rate limits. The scraper handles pagination and batching automatically.
  6. Extracts are summaries — the extract field contains only the article's introduction, not the full text. For full articles, follow the url link.
  7. Max 500 results per keyword — this is a Wikipedia API limit. For broader coverage, use multiple related keywords.

FAQ

Q: Does this scraper get the full article text? A: The extract field contains the article's introductory section in plain text. For complete article content, you can use the url to access the full page.

Q: How fast is it? A: Very fast. Wikipedia's API is highly optimized. A typical run extracting 50 articles completes in under 5 seconds.

Q: Does it need proxies? A: No. Wikipedia's API is open and does not block automated requests. The scraper identifies itself with a proper User-Agent header.

Q: Can I search in multiple languages at once? A: Each run uses one language. To search multiple languages, run the scraper once per language.