Pricing

from $0.60 / 1,000 results

Wikipedia Scraper

[💰 $0.6 / 1K] Search Wikipedia or fetch exact articles by URL or title, and extract clean structured data — summaries, full plain text, categories, 30-day pageviews, thumbnails, coordinates, and language counts — across 300+ language editions.

Pricing

from $0.60 / 1,000 results

Rating

0.0

(0)

Developer

SolidCode

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Why This Scraper?

300+ language editions — the picker surfaces 40 of the largest editions (English, German, French, Spanish, Japanese, Chinese, Arabic, and more); paste any non-English Wikipedia URL and the actor scrapes that edition automatically.
Two ways in: keyword search or exact fetch — run a keyword search across an edition, or pull specific articles by full URL (https://en.wikipedia.org/wiki/Alan_Turing) or bare title ("Alan Turing"). Mix both in a single run.
30-day pageview totals on every row — a real popularity signal, not a guess: the trailing ~30-day view count for each article, ideal for ranking topics by actual reader demand.
Sort by relevance or popularity — order search hits by best textual match, or by most-referenced (incoming-link-weighted) to surface the canonical, authoritative article first.
Full plain-text article body on demand — toggle fullText to capture the entire article as clean plain text (not just the intro), ready for NLP, summarization, or training corpora.
Wikidata one-line descriptions — the short canonical descriptor (e.g. "English mathematician and computer scientist") pulled straight from Wikidata, perfect for tooltips and entity labels.
Category taxonomy per article — the full list of categories each article belongs to ("British computer scientists", "1912 births") for topic mapping and classification.
Geo-coordinates for places & landmarks — latitude/longitude on every article that has a location, so cities, monuments, and venues drop straight onto a map.
Cross-edition reach signal — langCount tells you in how many language editions an article exists, a quick indicator of global notability.

Use Cases

Research & Academia

Build structured corpora of articles on a topic for literature reviews and citation context
Compare how the same subject is covered across language editions
Track article size and last-edited dates to study how topics evolve
Rank subjects by 30-day readership to find what audiences actually care about

SEO & Content

Pull authoritative summaries and Wikidata descriptions for entity-rich content
Identify high-traffic Wikipedia topics worth targeting in articles and FAQs
Map category taxonomies to plan topic clusters and internal linking
Surface the most-referenced canonical article for any keyword

Data Enrichment

Enrich CRM and product records with one-line Wikidata descriptions
Add geo-coordinates to place names for mapping and location intelligence
Attach thumbnails and canonical URLs to people, companies, and landmarks
Resolve ambiguous names to the most popular matching article

Machine Learning & NLP

Build full-text training datasets with clean plain-text article bodies
Generate summary/full-text pairs for summarization model fine-tuning
Create multilingual datasets by pulling the same topics across editions
Label corpora with category tags and word/byte-size metadata

Market & Competitive Intelligence

Monitor pageview trends for brands, products, and public figures
Track which companies and technologies are gaining reader attention
Benchmark notability across markets using cross-edition coverage counts

Getting Started

Simple Keyword Search

Search one edition and return the best-matching articles:

{
    "searchQueries": ["artificial intelligence"],
    "maxResultsPerSearch": 50
}

Fetch Exact Articles

Pull specific articles by full URL or bare title — including non-English editions:

{
    "articleUrls": [
        "https://en.wikipedia.org/wiki/Alan_Turing",
        "Marie Curie",
        "https://de.wikipedia.org/wiki/Albert_Einstein"
    ],
    "includeCategories": true
}

Advanced — Popularity-Sorted Full-Text Dataset

Search several topics, sort by popularity, and capture the full article body:

{
    "searchQueries": ["machine learning", "neural network", "deep learning"],
    "language": "en",
    "maxResultsPerSearch": 200,
    "sortBy": "popularity",
    "fullText": true,
    "includeCategories": true
}

Input Reference

What to Scrape

Parameter	Type	Default	Description
`searchQueries`	string[]	`[]`	Keywords to look up on Wikipedia (e.g. "climate change"). Each query runs its own search and returns the best-matching articles. Leave empty if you only want exact articles.
`articleUrls`	string[]	`[]`	Exact articles to fetch. Paste full Wikipedia URLs (e.g. `https://en.wikipedia.org/wiki/Alan_Turing`) or just a page title (e.g. "Alan Turing"). Full URLs set their own language automatically.
`language`	select	`English`	Which Wikipedia edition to use for searches and bare titles. 40 common editions are listed; full URLs override this.

Results

Parameter	Type	Default	Description
`maxResultsPerSearch`	integer	`50`	Maximum articles to return per search query (1–500). Articles fetched directly by URL or title are added on top. Recommended 50–200 for fast, affordable runs.
`sortBy`	select	`Relevance (best match)`	Order search results. Options: "Relevance (best match)" or "Popularity (most referenced)".

Content

Parameter	Type	Default	Description
`fullText`	boolean	`false`	Return the complete plain-text article body instead of just the intro summary. Richer data, larger dataset.
`includeCategories`	boolean	`true`	Include the list of categories each article belongs to (e.g. "1912 births"). Helpful for classification and topic mapping.

Output

Each article is one flat row. Here is a representative result:

{
    "pageId": 1208,
    "title": "Alan Turing",
    "language": "en",
    "url": "https://en.wikipedia.org/wiki/Alan_Turing",
    "summary": "Alan Mathison Turing was an English mathematician, computer scientist, logician, and cryptanalyst...",
    "fullText": "Alan Mathison Turing was an English mathematician... (full plain-text body when fullText is enabled)",
    "wikidataDescription": "English mathematician and computer scientist (1912–1954)",
    "categories": ["British computer scientists", "1912 births", "Alumni of King's College, Cambridge"],
    "thumbnail": "https://upload.wikimedia.org/wikipedia/commons/thumb/.../300px-Alan_Turing.jpg",
    "wordCount": 12840,
    "size": 198342,
    "pageviews": 415203,
    "langCount": 142,
    "coordinates": null,
    "lastEdited": "2026-06-10T08:14:32Z",
    "matchedQuery": "artificial intelligence",
    "scrapedAt": "2026-06-13T14:30:00Z"
}

Core Fields

Field	Type	Description
`pageId`	integer	Wikipedia page identifier
`title`	string	Article title
`language`	string	Language code of the edition this article came from
`url`	string	Canonical article URL
`matchedQuery`	string\|null	The search query that surfaced this article (null for direct URL/title fetches)
`scrapedAt`	string	ISO timestamp when the row was produced

Content

Field	Type	Description
`summary`	string	Intro extract as clean plain text
`fullText`	string\|null	Full plain-text article body — populated only when `fullText` is enabled
`wikidataDescription`	string\|null	One-line canonical description from Wikidata
`categories`	string[]	Categories the article belongs to — populated when `includeCategories` is on
`thumbnail`	string\|null	Lead image URL (null when the article has no lead image)

Popularity & Metadata

Field	Type	Description
`pageviews`	integer\|null	Total reader views over the trailing ~30 days
`langCount`	integer\|null	Number of language editions this article exists in
`wordCount`	integer	Word count of the returned text
`size`	integer	Article size in bytes
`lastEdited`	string	ISO timestamp of the most recent revision

Geo

Field	Type	Description
`coordinates`	object\|null	`{ "lat": …, "lon": … }` for articles with a location; null otherwise

Tips for Best Results

Paste a non-English URL to reach any of the 300+ editions — the language picker lists 40 common editions, but pasting https://ja.wikipedia.org/wiki/... or https://fi.wikipedia.org/wiki/... into articleUrls scrapes that edition directly, no picker needed.
Leave fullText off for fast, cheap summary runs — it pulls the entire article body and grows your dataset substantially. Turn it on only when you need full text for analysis or training.
Sort by popularity to find the canonical article — for ambiguous keywords, "Popularity (most referenced)" puts the authoritative, most-linked article first, ahead of niche or disambiguation pages.
Use pageviews to rank topics by real demand — it reflects actual 30-day readership, a far stronger signal than search rank for prioritizing content or research.
Mix search and exact fetch in one run — combine broad searchQueries with a hand-picked list of articleUrls to cover both discovery and known must-have articles.
Start with 50 results per query to test — confirm the data fits your needs, then raise maxResultsPerSearch (up to 500) for the full pull.

Pricing

From $0.60 per 1,000 results — flat pay-per-result, matching the lowest tier in this category while shipping more fields per article. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.

Results	No discount	Bronze	Silver	Gold
100	$0.072	$0.068	$0.064	$0.060
1,000	$0.72	$0.68	$0.64	$0.60
10,000	$7.20	$6.80	$6.40	$6.00
100,000	$72.00	$68.00	$64.00	$60.00

A "result" is any article row in the output dataset. No compute or time-based charges — you pay per result, plus a small fixed per-run start fee.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

Zapier / Make / n8n — Workflow automation
Google Sheets — Direct spreadsheet export
Slack / Email — Notifications on new results
Webhooks — Trigger custom APIs on run completion
Apify API — Full programmatic access

Legal & Ethical Use

This actor collects publicly available content from Wikipedia for legitimate research, analysis, and data enrichment. Wikipedia article text is published under the Creative Commons Attribution-ShareAlike (CC BY-SA) license — when you reuse or republish it, provide proper attribution and share derivative text under the same license. Users are responsible for complying with applicable laws and the Wikimedia Foundation's Terms of Use. Be respectful of the volunteer-run platform and use the data responsibly.

Wikipedia Scraper — Articles, Summaries & Images

hichemdev/wikipedia-scraper

Scrape Wikipedia by search query or title: summary, description, full plain text, images, coordinates and canonical URL. Any language. No API key.

Hichem Ben Moussa

Wikipedia Scraper

velvety_bedbug/wikipedia-scraper

Search Wikipedia articles, fetch article content and summaries, or get today's featured and most-read articles. Supports all Wikipedia language editions.

Peters Bugs

Wikipedia Article Scraper

cloud9_ai/wikipedia-scraper

Scrape Wikipedia articles by search keyword or exact title. Returns summaries, full article text, categories, and links. Supports 300+ languages.

cloud9

Wikipedia Article Scraper

crawlerbros/wikipedia-scraper

Extract structured data from Wikipedia articles. Get summaries, categories, images, metadata, and descriptions using Wikipedia's official API. Supports 300+ languages.

Crawler Bros

Wikipedia Scraper — Articles, Content & Pageviews

hipersoft/wikipedia-scraper

Search Wikipedia or look up exact articles and get the full plain-text content, summary, categories, image, Wikidata ID and monthly pageview trends. Any language. Great for research, RAG and AI datasets. No key.

hiper soft

Wikipedia Scraper

leftwinglautus/wikipedia-scraper

Scrape Wikipedia articles via the official Wikipedia API. Search articles, get summaries, full content, and categories.

Moeeze Hassan

Wikipedia Scraper – Articles, Summaries & Extracts

ninhothedev/wikipedia-scraper

$0.5/1K 🔥 Fast Wikipedia scraper! Article title, summary, full text, links, images & categories in any language. JSON, CSV, Excel or API in seconds. Search or list titles & pull thousands of articles for research & AI training ⚡

ninhothedev

Wikipedia Scraper - Articles, Search & Recent Changes

legend006/wikipedia-scraper

Scrape Wikipedia articles by title, run keyword searches, pull recent changes, or extract entire categories — across any of 300+ language editions. Returns clean text, summaries, references, links, and metadata. Built for AI/LLM training datasets, NLP research, and knowledge-graph building.

NIJ KANANI

Wikipedia — Article Summaries & Search

omao/wikipedia

Get clean Wikipedia article summaries by title or search: title, short description, extract, thumbnail, coordinates and URL, in any language. Powered by the official Wikipedia API. No API key, no anti-bot.

Marouane Oulabass

Wikipedia Data Extractor - Articles & Summaries

vernacular_reservoir/wikipedia-data-extractor

Extract structured data from Wikipedia articles by topic or keyword. Get title, summary, description, thumbnail, coordinates and related links. Supports all Wikipedia languages. No API key required.