Wikipedia Scraper - Articles, Summaries, Metadata avatar

Wikipedia Scraper - Articles, Summaries, Metadata

Pricing

from $2.00 / 1,000 article scrapeds

Go to Apify Store
Wikipedia Scraper - Articles, Summaries, Metadata

Wikipedia Scraper - Articles, Summaries, Metadata

Extract Wikipedia articles including full content, summary, thumbnails, categories, external links, coordinates, and Wikidata IDs. Multi-language support for 12+ languages. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Pricing

from $2.00 / 1,000 article scrapeds

Rating

0.0

(0)

Developer

Alessandro Santamaria

Alessandro Santamaria

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Share

Scrape Wikipedia articles at scale — full content, summaries, images, categories, and Wikidata links.

Build AI training datasets, knowledge graphs, research corpora, or enrich your app with encyclopedic facts. Powered by the official MediaWiki REST API for clean, reliable, respectful data extraction.

Features

  • 12+ Languages — English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Japanese, Chinese, Arabic
  • Full content extraction — plain text, cleaned HTML, and per-section breakdown with titles and heading levels
  • Summaries & descriptions — one-line descriptions and first-paragraph extracts
  • Images — thumbnails, main image, and all article images
  • Structured metadata — categories, external links, references/citations
  • Wikidata linking — every article comes with its Q-ID for entity resolution
  • Geo coordinates — lat/lng for places, landmarks, and geographic entities
  • Pageviews — 30-day view counts from the Wikimedia pageviews API
  • Disambiguation detection — flag ambiguous pages before ingesting
  • Search — find articles by keyword, not just by title
  • No auth, no anti-bot — uses the public MediaWiki API; no tokens, no captchas

Input

{
"titles": ["Berlin", "Albert_Einstein", "Machine_learning"],
"searchQuery": "quantum physics",
"urls": ["https://en.wikipedia.org/wiki/Quantum_computing"],
"language": "en",
"includeFullContent": true,
"includeImages": true,
"includeReferences": false,
"maxSearchResults": 10
}
FieldTypeDescription
titlesarrayDirect Wikipedia article titles
searchQuerystringKeyword search (returns top N matches)
urlsarrayWikipedia URLs — title is auto-extracted
languageenumWiki edition: en, de, fr, es, it, pt, nl, pl, ru, ja, zh, ar
includeFullContentboolFetch full article body + sections (default true)
includeImagesboolInclude all image URLs (default true)
includeReferencesboolInclude citations (default false)
maxSearchResultsintCap on search results (default 10)

Output Example

Real output for Berlin (English Wikipedia):

{
"title": "Berlin",
"url": "https://en.wikipedia.org/wiki/Berlin",
"language": "en",
"page_id": 3354,
"revision_id": 1234567890,
"extract": "Berlin is the capital and largest city of Germany by both area and population...",
"description": "Capital and largest city of Germany",
"content_full": "Berlin is the capital and largest city of Germany...",
"content_html": "<section>...</section>",
"thumbnail_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/.../Berlin.jpg",
"main_image_url": "https://upload.wikimedia.org/wikipedia/commons/.../Berlin.jpg",
"images": ["https://upload.wikimedia.org/..."],
"sections": [
{ "title": "History", "level": 2, "text": "The earliest evidence of settlements..." },
{ "title": "Geography", "level": 2, "text": "Berlin is in northeastern Germany..." }
],
"categories": ["Berlin", "Capitals in Europe", "Cities in Germany"],
"external_links": ["https://www.berlin.de/", "..."],
"coordinates": { "lat": 52.52, "lng": 13.405 },
"wikidata_id": "Q64",
"last_modified": "2026-04-01T12:34:56Z",
"word_count": 15842,
"view_count_30d": 1482391,
"is_disambiguation": false,
"scraped_at": "2026-04-07T10:00:00Z"
}

Use Cases

  • AI/LLM training data — Build high-quality, well-structured datasets for fine-tuning language models. Wikipedia is the gold standard for encyclopedic corpora.
  • Knowledge graphs — Link entities in your database to Wikidata Q-IDs. Every article comes with its canonical identifier, coordinates, and categories.
  • Academic research — Extract literature review material, cross-reference citations, and build topic-specific corpora across languages.
  • Content generation — Enrich articles, product pages, and blog posts with verified encyclopedia facts. Add "Did you know" boxes and related topic links.
  • Fact-checking pipelines — Verify claims against Wikipedia extracts and last-modified timestamps. Flag disambiguation pages automatically.
  • Travel content — Pull city, landmark, and attraction data with coordinates for travel blogs, booking sites, and map overlays.
  • Biographies — Scrape person articles for journalism, CRM enrichment, or historical datasets. Link people to their Wikidata records.

Pricing

Pay-per-event: you only pay for articles you actually extract.

EventPrice
enrichment-start$0.001
enrichment-result$0.002 per article

Example costs:

  • 100 articles — ~$0.20
  • 1,000 articles — ~$2.00
  • 10,000 articles — ~$20.00

No proxy costs — Wikipedia is a public API.

Issues & Feedback

Found a bug or want a feature? Open an issue.