Wikipedia Scraper - Article Content Extractor avatar

Wikipedia Scraper - Article Content Extractor

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Wikipedia Scraper - Article Content Extractor

Wikipedia Scraper - Article Content Extractor

Scrape Wikipedia articles. Search by topic and extract full structured content: summaries, sections, infobox data, categories, references, images, and edit history for any article.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

lulz bot

lulz bot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Wikipedia Scraper

Extract structured content from Wikipedia articles. Search by topic or provide direct article URLs to get summaries, full text, infobox data, categories, references, images, and metadata.

Features

  • Search by topic — find the most relevant Wikipedia articles for any query
  • Direct URL scraping — provide specific article URLs for targeted extraction
  • Structured content — articles are parsed into sections with headers
  • Infobox extraction — key-value data from article infoboxes (e.g., programming language details, country stats)
  • Multi-language — supports all Wikipedia language editions (en, es, fr, de, ja, etc.)
  • References — extracted reference list from each article
  • Categories & images — article classification and image file names

Input

FieldTypeDescription
searchQueriesstring[]Topics to search (e.g., "artificial intelligence", "JavaScript")
articleUrlsstring[]Direct Wikipedia article URLs
maxArticlesnumberMax articles per search query (default: 5)
maxResultsnumberMax total results (default: 25)
languagestringWikipedia language code (default: "en")
extractSectionsbooleanExtract full section content (default: true)

Output

Each article includes:

{
"title": "JavaScript",
"pageId": 9845,
"summary": "JavaScript is a programming language...",
"content": "Full article text...",
"sections": [
{ "title": "Introduction", "content": "..." },
{ "title": "History", "content": "..." }
],
"infobox": {
"Paradigm": "Multi-paradigm",
"Designed by": "Brendan Eich",
"First appeared": "December 4, 1995"
},
"categories": ["Programming languages", "Web development"],
"images": ["File:JavaScript_code.png"],
"thumbnail": "https://upload.wikimedia.org/...",
"references": ["..."],
"lastEdited": "2026-04-20T12:00:00Z",
"url": "https://en.wikipedia.org/wiki/JavaScript",
"language": "en"
}

Use Cases

  • Research — gather structured data on any topic
  • Knowledge bases — build datasets from Wikipedia's encyclopedia
  • NLP training data — extract clean text with metadata
  • Fact-checking — cross-reference claims with Wikipedia sources
  • Content enrichment — add Wikipedia context to your applications

Pricing

This actor uses pay-per-event pricing at $0.005 per article scraped.

Notes

  • Uses the official Wikipedia API (free, no auth required)
  • HTML is fetched for infobox and reference extraction
  • Rate-limited to be respectful of Wikipedia servers
  • Article content is capped at 50,000 characters per article