Wikipedia Article Scraper avatar

Wikipedia Article Scraper

Pricing

from $0.50 / 1,000 results

Go to Apify Store
Wikipedia Article Scraper

Wikipedia Article Scraper

Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.

Pricing

from $0.50 / 1,000 results

Rating

5.0

(10)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

11

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

Extract structured data from Wikipedia articles using the official MediaWiki API. Get article summaries, categories, images, metadata, and descriptions. Supports 300+ languages.

Features

  • Extract article titles, summaries, and descriptions
  • Get categories, images, and thumbnails
  • Support for 300+ Wikipedia languages
  • Two modes: scrape by URL or search by keyword
  • Uses official Wikipedia REST + MediaWiki APIs
  • No proxy or cookies required
  • Lightweight HTTP-only (no browser)
  • Proper rate limiting and User-Agent identification

Input

FieldTypeDefaultDescription
articleUrlsArrayWikipedia article URLs to scrape
searchQueriesArraySearch terms to find articles
maxArticlesPerQueryInteger5Max articles per search query (1-50)
languageString"en"Wikipedia language code

Example: Scrape by URL

{
"articleUrls": [
"https://en.wikipedia.org/wiki/Python_(programming_language)",
"https://en.wikipedia.org/wiki/Artificial_intelligence"
]
}

Example: Search by Keyword

{
"searchQueries": ["machine learning", "quantum computing"],
"maxArticlesPerQuery": 3,
"language": "en"
}

Output

FieldTypeDescription
titleStringArticle title
urlStringFull Wikipedia URL
summaryStringLead section extract (first few paragraphs)
descriptionStringWikidata short description
categoriesArrayArticle categories
thumbnailObjectThumbnail image with source, width, height
imagesArrayImage filenames from the article
lastModifiedStringLast edit timestamp
languageStringLanguage code
pageIdIntegerWikipedia page ID
scrapedAtStringISO timestamp when scraped

Use Cases

  • Research — collect structured article data for academic or business research
  • Content enrichment — augment your database with Wikipedia descriptions and metadata
  • Knowledge graphs — build knowledge bases from Wikipedia's categorized data
  • Education — gather article summaries for educational content
  • SEO — analyze Wikipedia's coverage of topics in your niche
  • Data science — use Wikipedia data for NLP training and analysis

FAQ

Is a proxy required?

No. Wikipedia's API is freely accessible. No proxy, cookies, or authentication needed.

What languages are supported?

All 300+ Wikipedia language editions. Set the language parameter to any valid code: en, fr, de, es, ja, zh, ru, pt, it, ar, ko, nl, pl, etc.

Are there rate limits?

Wikipedia asks for polite access with proper User-Agent headers. The scraper includes built-in delays (0.3-0.5s between requests) to respect Wikipedia's guidelines.

Can I scrape article content (full text)?

This scraper extracts the lead section summary. For full article text, the summary field contains a clean text extract of the opening paragraphs which is suitable for most use cases.