Wikipedia Scraper
Pricing
from $1.00 / 1,000 page returneds
Wikipedia Scraper
Search Wikipedia by keyword or fetch clean structured page data (plain-text extract, thumbnail, categories, URL) for given titles, via the public MediaWiki API. No key, no anti-bot. Search paginates; titles batch 50 at a time.
Pricing
from $1.00 / 1,000 page returneds
Rating
0.0
(0)
Developer
Dami's Studio
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
10 hours ago
Last modified
Categories
Share
Search Wikipedia by keyword, or fetch clean, structured page data for exact titles — straight from the official MediaWiki Action API. No API key, no login, no anti-bot.
Two modes
1. Search — set searchQuery. Returns matching articles with title, pageid, url, a plain-text snippet (the API's HTML is stripped for you), wordcount, size, and timestamp. The actor paginates automatically (50 per request) up to maxItems.
2. Page data — set pageTitles (a list of exact article titles). Returns title, pageid, url, the plain-text extract, a thumbnail image URL, and categories. Titles are batched 50 at a time. Turn on fullText to get the whole article instead of just the intro.
(If both are provided, search mode wins. Provide one or the other.)
What you get per row
| Field | Mode | Notes |
|---|---|---|
title | both | Article title. |
pageid | both | Stable Wikipedia page id (used to dedupe). |
url | both | Canonical article URL. |
snippet | search | Plain-text match snippet (HTML stripped). |
wordcount, size, timestamp | search | Article word count, byte size, last-edit time. |
extract | page | Plain-text article text (intro, or full body with fullText). |
thumbnail | page | Lead image URL (up to 400px), if the page has one. |
categories | page | Visible category names (hidden categories excluded). |
Input
| Field | Notes |
|---|---|
searchQuery | Keywords, e.g. machine learning. Leave empty if using titles. |
pageTitles | List of exact titles, e.g. ["Apify", "Web scraping"]. |
fullText | Page mode only. Full article text vs. just the intro. Default off. |
language | Wikipedia edition: en, fr, de, es, ja, … Default en. |
maxItems | Cap on returned pages. Default 50. |
Output
One dataset row per page (ok: true). Charged per page. Empty searches or unknown titles return a non-charged diagnostic row with an errorCode and a human-readable reason instead of silently returning nothing.
Example
{ "searchQuery": "machine learning", "language": "en", "maxItems": 30 }
{ "pageTitles": ["Apify", "Web scraping"], "fullText": false, "language": "en" }
Notes
Uses https://{language}.wikipedia.org/w/api.php. Per Wikimedia's policy the actor always sends a descriptive User-Agent with a contact. Results are deduped by pageid.