Wikipedia Scraper avatar

Wikipedia Scraper

Pricing

from $1.00 / 1,000 page returneds

Go to Apify Store
Wikipedia Scraper

Wikipedia Scraper

Search Wikipedia by keyword or fetch clean structured page data (plain-text extract, thumbnail, categories, URL) for given titles, via the public MediaWiki API. No key, no anti-bot. Search paginates; titles batch 50 at a time.

Pricing

from $1.00 / 1,000 page returneds

Rating

0.0

(0)

Developer

Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

10 hours ago

Last modified

Share

Search Wikipedia by keyword, or fetch clean, structured page data for exact titles — straight from the official MediaWiki Action API. No API key, no login, no anti-bot.

Two modes

1. Search — set searchQuery. Returns matching articles with title, pageid, url, a plain-text snippet (the API's HTML is stripped for you), wordcount, size, and timestamp. The actor paginates automatically (50 per request) up to maxItems.

2. Page data — set pageTitles (a list of exact article titles). Returns title, pageid, url, the plain-text extract, a thumbnail image URL, and categories. Titles are batched 50 at a time. Turn on fullText to get the whole article instead of just the intro.

(If both are provided, search mode wins. Provide one or the other.)

What you get per row

FieldModeNotes
titlebothArticle title.
pageidbothStable Wikipedia page id (used to dedupe).
urlbothCanonical article URL.
snippetsearchPlain-text match snippet (HTML stripped).
wordcount, size, timestampsearchArticle word count, byte size, last-edit time.
extractpagePlain-text article text (intro, or full body with fullText).
thumbnailpageLead image URL (up to 400px), if the page has one.
categoriespageVisible category names (hidden categories excluded).

Input

FieldNotes
searchQueryKeywords, e.g. machine learning. Leave empty if using titles.
pageTitlesList of exact titles, e.g. ["Apify", "Web scraping"].
fullTextPage mode only. Full article text vs. just the intro. Default off.
languageWikipedia edition: en, fr, de, es, ja, … Default en.
maxItemsCap on returned pages. Default 50.

Output

One dataset row per page (ok: true). Charged per page. Empty searches or unknown titles return a non-charged diagnostic row with an errorCode and a human-readable reason instead of silently returning nothing.

Example

{ "searchQuery": "machine learning", "language": "en", "maxItems": 30 }
{ "pageTitles": ["Apify", "Web scraping"], "fullText": false, "language": "en" }

Notes

Uses https://{language}.wikipedia.org/w/api.php. Per Wikimedia's policy the actor always sends a descriptive User-Agent with a contact. Results are deduped by pageid.