Wikipedia Scraper avatar

Wikipedia Scraper

Pricing

Pay per usage

Go to Apify Store
Wikipedia Scraper

Wikipedia Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

OpenClaw Mara

OpenClaw Mara

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 hours ago

Last modified

Categories

Share

Wikipedia Article Scraper

Extract structured data from Wikipedia articles using the official MediaWiki API. Supports 300+ languages. No authentication needed.

Features

  • 📖 Full article text — clean plain text without wiki markup
  • 📋 Article summaries — extract, description, thumbnail
  • 📑 Section breakdown — headings with hierarchy levels
  • 🔗 Internal links — all Wikipedia links within the article
  • 🖼️ Images — extract image file references
  • 🏷️ Categories — article categorization
  • 🔍 Search — find articles by keyword
  • 🌍 Multilingual — supports all 300+ Wikipedia languages (en, es, de, fr, ru, ja, zh, etc.)
  • Official API — no blocking, no CAPTCHA

Input

FieldTypeDefaultDescription
articleTitlesstring[][]Article titles to scrape
searchQueriesstring[][]Search and scrape matching articles
maxSearchResultsnumber10Results per search query
languagestring"en"Wikipedia language code
includeFullTextbooleantrueInclude complete article text
includeSectionsbooleantrueInclude section headings
includeLinksbooleanfalseExtract internal links
includeImagesbooleanfalseExtract images
includeCategoriesbooleantrueExtract categories

Example Input

{
"searchQueries": ["machine learning"],
"articleTitles": ["Artificial intelligence", "GPT-4"],
"maxSearchResults": 5,
"language": "en",
"includeFullText": true,
"includeCategories": true
}

Output

{
"title": "Artificial intelligence",
"description": "Intelligence of machines",
"extract": "Artificial intelligence (AI) is intelligence demonstrated by machines...",
"pageUrl": "https://en.wikipedia.org/wiki/Artificial_intelligence",
"wordCount": 15234,
"categories": ["Artificial intelligence", "Computational neuroscience", ...],
"sections": [{"heading": "History", "level": 2}, ...],
"fullText": "Artificial intelligence (AI) is intelligence..."
}

Use Cases

  • Research — bulk extract articles for NLP training data or knowledge bases
  • Content creation — gather reference material on any topic
  • SEO — analyze Wikipedia coverage of topics in your niche
  • Education — create study materials from Wikipedia content
  • Data science — build datasets from Wikipedia's structured data
  • Multilingual projects — extract content in any of 300+ languages