Wikipedia Page Summaries Scraper
Pricing
from $8.00 / 1,000 result items
Wikipedia Page Summaries Scraper
Pull Wikipedia article summaries via REST API. Returns title, description, extract (plain + HTML), thumbnail, lang, page ID, content URLs (desktop + mobile + edit), coordinates, page type, timestamps. Look up specific titles or get search results.
Pricing
from $8.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Share

📚 Wikipedia Article Summary Scraper
🚀 Pull Wikipedia article summaries with thumbnail, extract, coordinates, Wikidata link, revision ID, and language. Lookup or search modes.
🕒 Last updated: 2026-05-08 · 📊 25 fields per record · 60M+ Wikipedia pages · 300+ languages · summary extract, thumbnail, coords, Wikidata, revision · lookup or search
The Wikipedia Article Summary Scraper pulls structured summaries from Wikipedia's REST API. Output includes thumbnail and original-image URLs (with widths and heights), page ID, title and display title, normalized + canonical title, description and description source, summary extract (plain text + HTML), page type, namespace, Wikibase item ID, language code and direction, last-modified timestamp, revision ID, geographic coordinates, and desktop / mobile / edit / revisions URLs.
Two modes in one Actor: lookup by title (one per line), and search (using Wikipedia's opensearch). The dataset covers Wikipedia in any of 300+ languages. Set the language input to es, fr, de, etc.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Knowledge-graph builders, content marketers, ML researchers, journalists, encyclopedia apps, education platforms | Knowledge-graph extraction, encyclopedic-content displays, summary embeddings, fact-card UIs, education content |
📋 What the Wikipedia Article Summary Scraper does
Five filtering workflows in a single run:
- 🔍 Lookup mode. One title per line, returns rich summary per page.
- 🔍 Search mode. Wikipedia's opensearch with ranked matches.
- 🌐 300+ languages. Switch language with a single input.
- 🗺️ Coordinates included. Lat / lng for places, when the page is geo-tagged.
- 🔗 Wikidata link. Direct Wikibase item ID for cross-language joins.
💡 Why it matters: clean, server-side filtering and fresh data on every run.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan up to 1,000,000. |
mode | string | "lookup" | lookup or search. |
titles | string | "" | Newline-separated titles (lookup mode). |
query | string | "" | Search term (search mode). |
language | string | "en" | Wikipedia language code (e.g. `en`, `es`, `fr`, `de`, `ja`). |
Example: look up famous scientists.
{"maxItems": 50,"mode": "lookup","titles": "Albert Einstein\nMarie Curie\nIsaac Newton\nCharles Darwin\nGalileo Galilei","language": "en"}
Example: search topic in Spanish.
{"maxItems": 20,"mode": "search","query": "fotografía","language": "es"}
📊 Output
Each record contains 25 fields. Download as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🖼️ thumbnailUrl | string | "https://upload.wikimedia.org/.../thumb.jpg" |
🆔 pageId | number | 736 |
📛 title | string | "Albert Einstein" |
📛 displayTitle | string | "Albert Einstein" |
📛 normalizedTitle | string | "Albert Einstein" |
📛 canonicalTitle | string | "Albert_Einstein" |
📜 description | string | "German-born theoretical physicist (1879-1955)" |
📝 extract | string | "Albert Einstein was a German-born theoretical physicist..." |
📝 extractHtml | string | "<p><b>Albert Einstein</b> was a German-born theoretical physicist..." |
🏷️ type | string | "standard" |
🌐 language | string | "en" |
🔤 languageCode | string | "en" |
🔗 wikibaseItem | string | "Q937" |
🗺️ coordinatesLat | number | 52.5 |
🗺️ coordinatesLng | number | 13.4 |
📅 timestamp | string | "2026-04-29T14:32:11Z" |
🆔 revisionId | number | 1238762345 |
🌐 desktopUrl | string | "https://en.wikipedia.org/wiki/Albert_Einstein" |
📱 mobileUrl | string | "https://en.m.wikipedia.org/wiki/Albert_Einstein" |
✏️ editDesktopUrl | string | "https://en.wikipedia.org/wiki/Albert_Einstein?action=edit" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🔗 | Wikidata link. Wikibase item ID lets you join across languages and other knowledge graphs. |
| 🌍 | 300+ languages. Same Actor pulls summaries from any Wikipedia language. |
| 🗺️ | Geo coordinates. Lat / lng exposed when the page is geo-tagged. |
| 📜 | Plain + HTML extracts. Both formats per record. |
| 🆓 | No auth. Wikipedia REST is open. |
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ This Actor | $5 free credit | 60M+ pages | Live per run | 2 modes | ⚡ 2 min |
| Wikipedia REST direct | Free | Same | Live | DIY | 🐢 Code |
| DBpedia / Wikidata | Free | Triplestore | Live | SPARQL | 🐢 Hours |
| Manual scraping | Free | All | Live | DIY | 🐢 Days |
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Find the Wikipedia Article Summary Scraper on the Apify Store.
- 🎯 Set input. Pick filters and
maxItems. - 🚀 Run it. Click Start.
- 📥 Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating Wikipedia Article Summary Scraper
Control the scraper programmatically:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval.
🌟 Beyond business use cases
Data like this powers more than commercial workflows.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Pick lookup or search mode and supply titles or a query. The Actor calls the Wikipedia REST API and returns one record per page.
📊 How many fields per record?
25, including thumbnail, page ID, title fields, description, extract (plain + HTML), type, language metadata, Wikibase item, coordinates, timestamps, and URLs.
🌍 Which languages are supported?
Any Wikipedia language. Set the language input to the standard code (en, es, fr, de, ja, zh, ar, etc).
🗺️ Are geo-coordinates always present?
Only on geo-tagged pages (places, landmarks). Other pages have null.
🔗 What's the wikibaseItem field?
The Wikidata Q-number for the entity. Lets you join across languages and other knowledge bases.
📜 Is the full article body returned?
No, only the summary extract (lead section). Use Wikipedia's full-content API for the entire body.
🆓 Do I need an API key?
No. Wikipedia REST is open.
🔁 Can I schedule runs?
Yes. Schedule daily to refresh summaries.
⚖️ Is this data free to use?
Yes. Wikipedia content is licensed CC-BY-SA. Attribution required for redistribution.
💳 Do I need a paid Apify plan?
No. The free plan covers preview runs (10 records).
🔌 Integrate with any app
Wikipedia Article Summary Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications
- Airbyte - Pipe data into your warehouse
- GitHub - Trigger runs from commits
- Google Drive - Export datasets to Sheets
🔗 Recommended Actors
- 🌐 Wikidata Entity Search - 100M+ open knowledge-graph entities
- 🌍 Wikivoyage Travel Articles - Wikivoyage city and country articles with image, geo
- 🌍 REST Countries Reference Data - Every country with flag, capital, currency, languages
- 📊 Stack Exchange Questions - Search 170+ Stack Exchange Q&A sites
- 🌍 GeoNames Places + Postal Codes - 12M+ places with admin hierarchy, lat/lng, alternate names
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Wikimedia Foundation, Wikipedia editors, or any cited reference work. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.