Pricing

from $2.00 / 1,000 article scrapeds

Wikipedia Scraper - Articles, Summaries, Metadata

Extract Wikipedia articles including full content, summary, thumbnails, categories, external links, coordinates, and Wikidata IDs. Multi-language support for 12+ languages. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Pricing

from $2.00 / 1,000 article scrapeds

Rating

0.0

(0)

Developer

Ale

Actor stats

Bookmarked

Total users

Monthly active users

22 days ago

Last modified

Features

12+ Languages — English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Japanese, Chinese, Arabic
Full content extraction — plain text, cleaned HTML, and per-section breakdown with titles and heading levels
Summaries & descriptions — one-line descriptions and first-paragraph extracts
Images — thumbnails, main image, and all article images
Structured metadata — categories, external links, references/citations
Wikidata linking — every article comes with its Q-ID for entity resolution
Geo coordinates — lat/lng for places, landmarks, and geographic entities
Pageviews — 30-day view counts from the Wikimedia pageviews API
Disambiguation detection — flag ambiguous pages before ingesting
Search — find articles by keyword, not just by title
No auth, no anti-bot — uses the public MediaWiki API; no tokens, no captchas

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client — Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/wikipedia-scraper

Example prompt once connected:

"Use wikipedia-scraper to process data with wikipedia. Return results as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

Input

{
  "titles": ["Berlin", "Albert_Einstein", "Machine_learning"],
  "searchQuery": "quantum physics",
  "urls": ["https://en.wikipedia.org/wiki/Quantum_computing"],
  "language": "en",
  "includeFullContent": true,
  "includeImages": true,
  "includeReferences": false,
  "maxSearchResults": 10
}

Field	Type	Description
`titles`	array	Direct Wikipedia article titles
`searchQuery`	string	Keyword search (returns top N matches)
`urls`	array	Wikipedia URLs — title is auto-extracted
`language`	enum	Wiki edition: `en`, `de`, `fr`, `es`, `it`, `pt`, `nl`, `pl`, `ru`, `ja`, `zh`, `ar`
`includeFullContent`	bool	Fetch full article body + sections (default `true`)
`includeImages`	bool	Include all image URLs (default `true`)
`includeReferences`	bool	Include citations (default `false`)
`maxSearchResults`	int	Cap on search results (default `10`)

Output Example

Real output for Berlin (English Wikipedia):

{
  "title": "Berlin",
  "url": "https://en.wikipedia.org/wiki/Berlin",
  "language": "en",
  "page_id": 3354,
  "revision_id": 1234567890,
  "extract": "Berlin is the capital and largest city of Germany by both area and population...",
  "description": "Capital and largest city of Germany",
  "content_full": "Berlin is the capital and largest city of Germany...",
  "content_html": "<section>...</section>",
  "thumbnail_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/.../Berlin.jpg",
  "main_image_url": "https://upload.wikimedia.org/wikipedia/commons/.../Berlin.jpg",
  "images": ["https://upload.wikimedia.org/..."],
  "sections": [
    { "title": "History", "level": 2, "text": "The earliest evidence of settlements..." },
    { "title": "Geography", "level": 2, "text": "Berlin is in northeastern Germany..." }
  ],
  "categories": ["Berlin", "Capitals in Europe", "Cities in Germany"],
  "external_links": ["https://www.berlin.de/", "..."],
  "coordinates": { "lat": 52.52, "lng": 13.405 },
  "wikidata_id": "Q64",
  "last_modified": "2026-04-01T12:34:56Z",
  "word_count": 15842,
  "view_count_30d": 1482391,
  "is_disambiguation": false,
  "scraped_at": "2026-04-07T10:00:00Z"
}

Use Cases

AI/LLM training data — Build high-quality, well-structured datasets for fine-tuning language models. Wikipedia is the gold standard for encyclopedic corpora.
Knowledge graphs — Link entities in your database to Wikidata Q-IDs. Every article comes with its canonical identifier, coordinates, and categories.
Academic research — Extract literature review material, cross-reference citations, and build topic-specific corpora across languages.
Content generation — Enrich articles, product pages, and blog posts with verified encyclopedia facts. Add "Did you know" boxes and related topic links.
Fact-checking pipelines — Verify claims against Wikipedia extracts and last-modified timestamps. Flag disambiguation pages automatically.
Travel content — Pull city, landmark, and attraction data with coordinates for travel blogs, booking sites, and map overlays.
Biographies — Scrape person articles for journalism, CRM enrichment, or historical datasets. Link people to their Wikidata records.

Pricing

Pay-per-event: you only pay for articles you actually extract.

Event	Price
`enrichment-start`	$0.001
`enrichment-result`	$0.002 per article

Example costs:

100 articles — ~$0.20
1,000 articles — ~$2.00
10,000 articles — ~$20.00

No proxy costs — Wikipedia is a public API.

Issues & Feedback

Found a bug or want a feature? Open an issue.

HTML to Markdown — Convert scraped HTML into LLM-ready Markdown
RSS Feed Reader — Bulk parse RSS, Atom and JSON feeds
Website Content Crawler — Crawl full websites and extract text
Google Maps Scraper — Business listings, reviews, and geo data

Wikipedia Data Extractor - Articles & Summaries

vernacular_reservoir/wikipedia-data-extractor

Extract structured data from Wikipedia articles by topic or keyword. Get title, summary, description, thumbnail, coordinates and related links. Supports all Wikipedia languages. No API key required.

Aleksandrs

Wikipedia Article Scraper

crawlerbros/wikipedia-scraper

Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.

Crawler Bros

5.0

Wikipedia Scraper

automation-lab/wikipedia-scraper

Search and extract Wikipedia articles — titles, summaries, full content, categories, and images. Uses the free MediaWiki API.

Stas Persiianenko

Wikipedia Article Search

ryanclinton/wikipedia-article-search

Search and retrieve structured data from Wikipedia articles across 15 language editions. This Apify actor queries the MediaWiki Search API to find relevant articles, then enriches each result with plain-text summaries, descriptions, Wikidata IDs, and thumbnail images via the Wikipedia REST API.

Ryan Clinton

Wikipedia Article Extractor

glassventures/wikipedia-article-extractor

Extract Wikipedia articles via MediaWiki API. Get full text, summaries, sections, categories, images, links. Multi-language. Perfect for AI/ML training data and RAG.

Glass Ventures

Wikipedia Article Scraper

cloud9_ai/wikipedia-scraper

Scrape Wikipedia articles by search keyword or exact title. Returns summaries, full article text, categories, and links. Supports 300+ languages.

cloud9

Wikipedia Scraper | $5 / 1k | Fast & Reliable

fatihtahta/wikipedia-scraper

Get full articles and detailed search results with the Wikipedia Scraper. Extract structured data including titles, summaries, citations, and full content. Ideal for market research, AI training, and competitive intelligence.

Fatih Tahta

Wikipedia MCP Server

agentify/wikipedia-mcp-server

MCP server for Wikipedia, providing LLMs and clients with real-time access to Wikipedia articles, summaries, sections, and related information via Apify Actor.

agentify

Wikipedia Scraper - Articles, Search & Recent Changes

legend006/wikipedia-scraper

Scrape Wikipedia articles by title, run keyword searches, pull recent changes, or extract entire categories — across any of 300+ language editions. Returns clean text, summaries, references, links, and metadata. Built for AI/LLM training datasets, NLP research, and knowledge-graph building.

NIJ KANANI

Wikipedia Scraper — Articles, Summaries & Search

openclawmara/wikipedia-scraper

Scrape Wikipedia across 300+ languages. Modes: full articles, summaries, search, random, recent changes, category browse. Extracts text, sections, references, images, links, infobox. Official MediaWiki API — stable, no auth. Great for research, knowledge graphs, content enrichment.