Pricing

Pay per event

Wikipedia Scraper

Search and extract Wikipedia articles — titles, summaries, full content, categories, and images. Uses the free MediaWiki API.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

19 days ago

Last modified

What does Wikipedia Scraper do?

Wikipedia Scraper searches Wikipedia using the official MediaWiki API and extracts structured data from matching articles. For each search keyword, it returns article metadata including the introductory extract (summary), word count, page size, thumbnail image, and direct URL.

The scraper uses Wikipedia's built-in search API, so results match what you'd find searching on Wikipedia itself — ranked by relevance with support for all Wikipedia languages.

Who is it for?

🎓 Academic researchers — extracting structured knowledge from Wikipedia articles at scale
🤖 NLP engineers — building training datasets from Wikipedia text and metadata
📊 Data analysts — collecting factual data and statistics from Wikipedia pages
💻 App developers — enriching applications with Wikipedia content and summaries
📝 Content creators — gathering reference material and structured facts for writing

Why scrape Wikipedia?

Wikipedia is the world's largest free encyclopedia with over 60 million articles across 300+ languages. It's a primary source for:

Knowledge base construction — build reference datasets for AI training, chatbots, or research databases
LLM and RAG pipelines — feed clean, structured article text into retrieval-augmented generation systems, fine-tuning datasets, or AI agent knowledge bases
Content enrichment — add Wikipedia summaries to product catalogs, educational platforms, or content management systems
Research and analysis — analyze article coverage, word counts, and edit patterns across topics
Multilingual data — gather information in any language Wikipedia supports
SEO and content strategy — understand topic coverage and find content gaps

How much does it cost to scrape Wikipedia?

Wikipedia Scraper uses pay-per-event pricing:

Event	Price
Run started	$0.001
Article extracted	$0.001 per article

Example costs:

10 articles on "machine learning": ~$0.011
100 articles on "history": ~$0.101
500 articles across 5 keywords: ~$0.506

Platform costs are minimal — a typical run uses under $0.001 in compute. Wikipedia's API is fast and does not require proxies.

Input parameters

Parameter	Type	Description	Default
`searchQueries`	string[]	Keywords to search on Wikipedia. Each keyword runs a separate search.	Required
`language`	string	Wikipedia language code (e.g., `en`, `de`, `fr`, `es`, `ja`, `zh`)	`"en"`
`maxResultsPerSearch`	integer	Maximum articles per keyword (1–500)	`50`

Input example

{
  "searchQueries": ["artificial intelligence", "quantum computing"],
  "language": "en",
  "maxResultsPerSearch": 20
}

Output example

Each article is returned as a JSON object:

{
  "pageId": 1164,
  "title": "Artificial intelligence",
  "extract": "Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...",
  "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
  "wordCount": 26473,
  "size": 266568,
  "lastEdited": "2026-03-02T11:28:15Z",
  "thumbnail": "https://upload.wikimedia.org/wikipedia/commons/thumb/...",
  "scrapedAt": "2026-03-03T04:08:23.785Z"
}

Output fields

Field	Type	Description
`pageId`	number	Wikipedia internal page identifier
`title`	string	Article title
`extract`	string	Introductory summary (plain text, no HTML)
`url`	string	Direct link to the Wikipedia article
`wordCount`	number	Total word count of the article
`size`	number	Article size in bytes
`lastEdited`	string	ISO timestamp of the last edit
`thumbnail`	string	URL to article thumbnail image (if available)
`scrapedAt`	string	ISO timestamp when the data was extracted

Supported languages

Wikipedia Scraper supports all 300+ Wikipedia language editions. Use the standard language code:

Code	Language	Articles
`en`	English	6.9M+
`de`	German	2.9M+
`fr`	French	2.6M+
`es`	Spanish	2.0M+
`ja`	Japanese	1.4M+
`ru`	Russian	1.9M+
`zh`	Chinese	1.4M+
`pt`	Portuguese	1.1M+
`it`	Italian	1.8M+
`ar`	Arabic	1.2M+

Any valid Wikipedia language code works — see the full list.

How to scrape Wikipedia articles

Open Wikipedia Scraper on Apify.
Enter one or more search keywords in the searchQueries field.
Set the language code (e.g., en, de, fr) for the Wikipedia edition you want.
Adjust maxResultsPerSearch to control how many articles per keyword (default: 50).
Click Start and wait for the scrape to finish.
Download articles as JSON, CSV, or Excel from the Dataset tab.

API usage

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("automation-lab/wikipedia-scraper").call(run_input={
    "searchQueries": ["climate change", "renewable energy"],
    "language": "en",
    "maxResultsPerSearch": 20,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} — {item['wordCount']} words")
    print(f"  {item['url']}")
    print(f"  {item['extract'][:200]}...")

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/wikipedia-scraper').call({
    searchQueries: ['climate change', 'renewable energy'],
    language: 'en',
    maxResultsPerSearch: 20,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
    console.log(`${item.title} — ${item.wordCount} words`);
    console.log(`  ${item.url}`);
});

REST API

curl -X POST "https://api.apify.com/v2/acts/automation-lab/wikipedia-scraper/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "searchQueries": ["artificial intelligence"],
    "language": "en",
    "maxResultsPerSearch": 10
  }'

Integrations

Connect Wikipedia Scraper to hundreds of apps using built-in integrations:

Google Sheets — export article data to spreadsheets
Slack / Microsoft Teams — get notifications when scraping completes
Zapier / Make — trigger workflows with scraped Wikipedia data
Amazon S3 / Google Cloud Storage — store large datasets in cloud storage
Webhook — send results to your own API endpoint

Tips and best practices

Use specific keywords — more specific searches return more relevant results. "Quantum entanglement" is better than "quantum".
Batch keywords efficiently — combine related keywords in one run to save on startup costs.
Language parameter — set the language code to search non-English Wikipedias. Results, summaries, and URLs will all be in the selected language.
Word count filtering — use the wordCount field to filter out stub articles (typically < 500 words).
Rate limits — Wikipedia's API is generous but has rate limits. The scraper handles pagination and batching automatically.
Extracts are summaries — the extract field contains only the article's introduction, not the full text. For full articles, follow the url link.
Max 500 results per keyword — this is a Wikipedia API limit. For broader coverage, use multiple related keywords.

Legality

Scraping publicly available data is generally legal according to the US Court of Appeals ruling (HiQ Labs v. LinkedIn). This actor only accesses publicly available information and does not require authentication. Always review and comply with the target website's Terms of Service before scraping. For personal data, ensure compliance with GDPR, CCPA, and other applicable privacy regulations.

FAQ

Q: Does this scraper get the full article text? A: The extract field contains the article's introductory section in plain text. For complete article content, you can use the url to access the full page.

Q: How fast is it? A: Very fast. Wikipedia's API is highly optimized. A typical run extracting 50 articles completes in under 5 seconds.

Q: Does it need proxies? A: No. Wikipedia's API is open and does not block automated requests. The scraper identifies itself with a proper User-Agent header.

Q: Can I search in multiple languages at once? A: Each run uses one language. To search multiple languages, run the scraper once per language.

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/wikipedia-scraper"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/wikipedia-scraper"
        }
    }
}

Example prompts

"Search Wikipedia for articles about quantum computing and give me the summaries"
"Fetch Wikipedia articles on these 5 historical events and compare their word counts"
"Look up Wikipedia articles on machine learning in both English and German and extract the introductions"

Learn more in the Apify MCP documentation.

The extract is truncated or too short. The extract field contains only the article's introductory section, not the full text. This is by design to keep responses fast and costs low. Use the url field to access the complete article.

I'm getting irrelevant results for my search query. Wikipedia's search API ranks by relevance, which may include loosely related articles. Use more specific keywords (e.g., "quantum entanglement" instead of "quantum") and reduce maxResultsPerSearch to get only the top matches.

Other research and news scrapers on Apify

ArXiv Scraper -- search and extract academic papers from ArXiv
CrossRef Scraper -- extract scholarly article metadata from CrossRef
OpenAlex Scraper -- search and extract academic research data from OpenAlex

Wikipedia Scraper

leftwinglautus/wikipedia-scraper

Scrape Wikipedia articles via the official Wikipedia API. Search articles, get summaries, full content, and categories.

Moeeze Hassan

Wikipedia Scraper — Search Articles & Extract Content

puskin/wikipedia-scraper

Search Wikipedia articles, get summaries, and extract full page content via the free MediaWiki API. No authentication required — perfect for research, AI training data, and knowledge base building.

Giovanni Bucci

Wikipedia Article Extractor

glassventures/wikipedia-article-extractor

Extract Wikipedia articles via MediaWiki API. Get full text, summaries, sections, categories, images, links. Multi-language. Perfect for AI/ML training data and RAG.

Glass Ventures

Wikipedia Article Scraper

crawlerbros/wikipedia-scraper

Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.

Crawler Bros

Wikipedia Scraper

velvety_bedbug/wikipedia-scraper

Search Wikipedia articles, fetch article content and summaries, or get today's featured and most-read articles. Supports all Wikipedia language editions.

Peters Bugs

Wikipedia Article Scraper - Search & Extract Content

klondikeking/wikipedia-article-scraper

Search and extract Wikipedia article metadata, summaries, and content via the official MediaWiki API. No scraping overhead — pure API integration with high reliability.

Pierrick McD0nald

Wikipedia Scraper | $5 / 1k | Fast & Reliable

fatihtahta/wikipedia-scraper

Get full articles and detailed search results with the Wikipedia Scraper. Extract structured data including titles, summaries, citations, and full content. Ideal for market research, AI training, and competitive intelligence.

Fatih Tahta

Wikipedia Data Extractor - Articles & Summaries

vernacular_reservoir/wikipedia-data-extractor

Extract structured data from Wikipedia articles by topic or keyword. Get title, summary, description, thumbnail, coordinates and related links. Supports all Wikipedia languages. No API key required.

Aleksandrs

Wikipedia Article Extractor

johnlenflure/wikipedia-extractor

Extract structured content from Wikipedia articles. Get summaries, sections, categories, infobox data, images, and internal links in any language.

Sinan Donmez

Wikipedia Scraper — Articles, Summaries & Search

openclawmara/wikipedia-scraper

Scrape Wikipedia across 300+ languages. Modes: full articles, summaries, search, random, recent changes, category browse. Extracts text, sections, references, images, links, infobox. Official MediaWiki API — stable, no auth. Great for research, knowledge graphs, content enrichment.