Wikipedia Page Summaries Scraper avatar

Wikipedia Page Summaries Scraper

Pricing

from $8.00 / 1,000 result items

Go to Apify Store
Wikipedia Page Summaries Scraper

Wikipedia Page Summaries Scraper

Pull Wikipedia article summaries via REST API. Returns title, description, extract (plain + HTML), thumbnail, lang, page ID, content URLs (desktop + mobile + edit), coordinates, page type, timestamps. Look up specific titles or get search results.

Pricing

from $8.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

ParseForge Banner

📚 Wikipedia Article Summary Scraper

🚀 Pull Wikipedia article summaries with thumbnail, extract, coordinates, Wikidata link, revision ID, and language. Lookup or search modes.

🕒 Last updated: 2026-05-08 · 📊 25 fields per record · 60M+ Wikipedia pages · 300+ languages · summary extract, thumbnail, coords, Wikidata, revision · lookup or search

The Wikipedia Article Summary Scraper pulls structured summaries from Wikipedia's REST API. Output includes thumbnail and original-image URLs (with widths and heights), page ID, title and display title, normalized + canonical title, description and description source, summary extract (plain text + HTML), page type, namespace, Wikibase item ID, language code and direction, last-modified timestamp, revision ID, geographic coordinates, and desktop / mobile / edit / revisions URLs.

Two modes in one Actor: lookup by title (one per line), and search (using Wikipedia's opensearch). The dataset covers Wikipedia in any of 300+ languages. Set the language input to es, fr, de, etc.

🎯 Target Audience💡 Primary Use Cases
Knowledge-graph builders, content marketers, ML researchers, journalists, encyclopedia apps, education platformsKnowledge-graph extraction, encyclopedic-content displays, summary embeddings, fact-card UIs, education content

📋 What the Wikipedia Article Summary Scraper does

Five filtering workflows in a single run:

  • 🔍 Lookup mode. One title per line, returns rich summary per page.
  • 🔍 Search mode. Wikipedia's opensearch with ranked matches.
  • 🌐 300+ languages. Switch language with a single input.
  • 🗺️ Coordinates included. Lat / lng for places, when the page is geo-tagged.
  • 🔗 Wikidata link. Direct Wikibase item ID for cross-language joins.

💡 Why it matters: clean, server-side filtering and fresh data on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan up to 1,000,000.
modestring"lookup"lookup or search.
titlesstring""Newline-separated titles (lookup mode).
querystring""Search term (search mode).
languagestring"en"Wikipedia language code (e.g. `en`, `es`, `fr`, `de`, `ja`).

Example: look up famous scientists.

{
"maxItems": 50,
"mode": "lookup",
"titles": "Albert Einstein\nMarie Curie\nIsaac Newton\nCharles Darwin\nGalileo Galilei",
"language": "en"
}

Example: search topic in Spanish.

{
"maxItems": 20,
"mode": "search",
"query": "fotografía",
"language": "es"
}

📊 Output

Each record contains 25 fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🖼️ thumbnailUrlstring"https://upload.wikimedia.org/.../thumb.jpg"
🆔 pageIdnumber736
📛 titlestring"Albert Einstein"
📛 displayTitlestring"Albert Einstein"
📛 normalizedTitlestring"Albert Einstein"
📛 canonicalTitlestring"Albert_Einstein"
📜 descriptionstring"German-born theoretical physicist (1879-1955)"
📝 extractstring"Albert Einstein was a German-born theoretical physicist..."
📝 extractHtmlstring"<p><b>Albert Einstein</b> was a German-born theoretical physicist..."
🏷️ typestring"standard"
🌐 languagestring"en"
🔤 languageCodestring"en"
🔗 wikibaseItemstring"Q937"
🗺️ coordinatesLatnumber52.5
🗺️ coordinatesLngnumber13.4
📅 timestampstring"2026-04-29T14:32:11Z"
🆔 revisionIdnumber1238762345
🌐 desktopUrlstring"https://en.wikipedia.org/wiki/Albert_Einstein"
📱 mobileUrlstring"https://en.m.wikipedia.org/wiki/Albert_Einstein"
✏️ editDesktopUrlstring"https://en.wikipedia.org/wiki/Albert_Einstein?action=edit"

📦 Sample records


✨ Why choose this Actor

Capability
🔗Wikidata link. Wikibase item ID lets you join across languages and other knowledge graphs.
🌍300+ languages. Same Actor pulls summaries from any Wikipedia language.
🗺️Geo coordinates. Lat / lng exposed when the page is geo-tagged.
📜Plain + HTML extracts. Both formats per record.
🆓No auth. Wikipedia REST is open.

📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ This Actor$5 free credit60M+ pagesLive per run2 modes⚡ 2 min
Wikipedia REST directFreeSameLiveDIY🐢 Code
DBpedia / WikidataFreeTriplestoreLiveSPARQL🐢 Hours
Manual scrapingFreeAllLiveDIY🐢 Days

🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Find the Wikipedia Article Summary Scraper on the Apify Store.
  3. 🎯 Set input. Pick filters and maxItems.
  4. 🚀 Run it. Click Start.
  5. 📥 Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to dataset: 3-5 minutes. No coding required.


💼 Business use cases

📚 Knowledge + Education

  • Encyclopedia-app summaries
  • Quiz / trivia content
  • Reference card UIs
  • Reading-comprehension exercises
  • Summary embeddings
  • Fine-tune QA models
  • Knowledge-graph seeding
  • Multilingual training data

📰 Journalism + Content

  • Background fact cards
  • Multi-language coverage
  • Topic-page generation
  • Subject-A vs subject-B comparisons

🌐 Localization

  • Multi-language content seeding
  • Cross-language entity matching
  • Translated summaries
  • Localized SEO content

🔌 Automating Wikipedia Article Summary Scraper

Control the scraper programmatically:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval.


🌟 Beyond business use cases

Data like this powers more than commercial workflows.

🎓 Research and academia

  • Reproducible Wikipedia corpora
  • Cross-language entity studies
  • Educational fact-card content
  • Knowledge-graph research

🎨 Personal and creative

  • Personal study aids
  • Trivia apps
  • Side projects with summary data
  • Reading-list backbones

🤝 Non-profit and civic

  • Free encyclopedic content
  • Library reference tools
  • Heritage knowledge preservation
  • Educational outreach

🧪 Experimentation

  • Train summarization models
  • Prototype QA systems
  • Build entity-disambiguation tools
  • Test multilingual pipelines

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick lookup or search mode and supply titles or a query. The Actor calls the Wikipedia REST API and returns one record per page.

📊 How many fields per record?

25, including thumbnail, page ID, title fields, description, extract (plain + HTML), type, language metadata, Wikibase item, coordinates, timestamps, and URLs.

🌍 Which languages are supported?

Any Wikipedia language. Set the language input to the standard code (en, es, fr, de, ja, zh, ar, etc).

🗺️ Are geo-coordinates always present?

Only on geo-tagged pages (places, landmarks). Other pages have null.

🔗 What's the wikibaseItem field?

The Wikidata Q-number for the entity. Lets you join across languages and other knowledge bases.

📜 Is the full article body returned?

No, only the summary extract (lead section). Use Wikipedia's full-content API for the entire body.

🆓 Do I need an API key?

No. Wikipedia REST is open.

🔁 Can I schedule runs?

Yes. Schedule daily to refresh summaries.

⚖️ Is this data free to use?

Yes. Wikipedia content is licensed CC-BY-SA. Attribution required for redistribution.

💳 Do I need a paid Apify plan?

No. The free plan covers preview runs (10 records).


🔌 Integrate with any app

Wikipedia Article Summary Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits
  • Google Drive - Export datasets to Sheets

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Wikimedia Foundation, Wikipedia editors, or any cited reference work. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.