Pricing

from $1.00 / 1,000 article fetcheds

Wikipedia Article Search

Search and extract Wikipedia article data in 15 languages. Get titles, summaries, word counts, Wikidata IDs, thumbnails, and URLs. No API key needed. Perfect for research, content enrichment, knowledge graph building, and SEO.

Pricing

from $1.00 / 1,000 article fetcheds

Rating

0.0

(0)

Developer

ryan clinton

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

Why use Wikipedia Article Search?

No API key required -- Wikipedia's API is completely free and open. You only need an Apify account to run this actor.
Structured output -- Returns clean, flat JSON records ready for export to CSV, Google Sheets, databases, or downstream processing pipelines.
Two-step enrichment -- Goes beyond basic search results by fetching human-readable article extracts, short descriptions, Wikidata entity IDs, and thumbnail images for every result.
Multilingual -- Search across 15 Wikipedia language editions including English, German, French, Spanish, Japanese, Chinese, and more.
Scheduled runs -- Track how Wikipedia content evolves over time by scheduling recurring searches through Apify's built-in scheduler.
Zero infrastructure -- No servers to manage, no API wrappers to maintain, no rate-limit logic to implement. Configure, run, and get results.

Key features

Full-text search across all Wikipedia articles using the MediaWiki Action API with relevance-ranked results
Summary enrichment via the Wikipedia REST API adds plain-text extracts, descriptions, Wikidata IDs, and thumbnails for each result
15 language editions -- English, German, French, Spanish, Japanese, Chinese, Russian, Portuguese, Italian, Arabic, Dutch, Korean, Polish, Swedish, and Vietnamese
Up to 500 results per search query in a single run, matching the Wikipedia API's maximum capacity
Rate-limit compliant with a built-in 200ms delay between enrichment requests to respect Wikimedia's server policies
HTML-cleaned snippets -- search snippets are stripped of HTML tags and decoded entities for clean, readable text output
Wikidata cross-referencing -- use the returned wikidataId to link articles across language editions and external knowledge bases like DBpedia
Flat JSON output -- every record is a self-contained object with no nested structures, making export and analysis straightforward
Graceful error handling -- if an individual summary enrichment request fails, the actor skips it and continues processing remaining results without crashing

How to use Wikipedia Article Search

Using the Apify Console

Go to the Wikipedia Article Search actor page on Apify and click Try for free.
Enter your Search Query -- the topic or keywords you want to find articles about (e.g., "machine learning", "Roman Empire", "climate change").
Select a Language from the dropdown. The default is English (en). All 15 supported language editions are available in the selector.
Choose whether to Include Article Summary. Enabling this adds a plain-text extract, description, Wikidata ID, and thumbnail for each result. Disabling it makes runs faster but returns less data per article.
Set the Max Results to control how many articles to return (1 to 500, default 20).
Click Start to run the actor.
When the run finishes, view your results in the Dataset tab. Export as JSON, CSV, Excel, or access programmatically via the Apify API.
Optionally, set up a Schedule to run the same search on a recurring basis (daily, weekly, monthly) to track content changes over time.

Using the API

You can also start the actor programmatically by sending a POST request with your input as JSON. See the API & Integration section below for complete Python, JavaScript, and cURL examples. The API supports both synchronous execution (wait for results) and asynchronous execution (start the run and poll for results later).

Input parameters

Field	Type	Required	Default	Description
`query`	String	Yes	--	Search query for Wikipedia articles. Supports natural language phrases and specific topic names.
`language`	String	No	`en`	Wikipedia language code. Options: en, de, fr, es, ja, zh, ru, pt, it, ar, nl, ko, pl, sv, vi.
`includeSummary`	Boolean	No	`true`	Fetch article summary/extract for each result via the REST API. Richer data but slower runs.
`maxResults`	Integer	No	`20`	Maximum number of articles to return. Range: 1--500 (Wikipedia API limit).

Input example

{
    "query": "quantum computing",
    "language": "en",
    "includeSummary": true,
    "maxResults": 25
}

Tips for best results

Use specific, multi-word queries -- "machine learning neural networks" returns more targeted results than "computers". Wikipedia's search engine ranks by relevance, so precise phrasing matters.
Disable summaries when speed matters -- If you only need titles, URLs, word counts, and snippets, set includeSummary to false to skip per-article REST API calls and finish much faster.
Search in the relevant language -- An article may exist in one Wikipedia edition but not another. For regionally significant topics, use the appropriate language code (e.g., ja for Japanese history topics, de for German political figures).
Leverage Wikidata IDs -- The wikidataId field (e.g., Q3552) uniquely identifies a concept across all Wikipedia language editions and links to external structured knowledge bases like DBpedia, Google Knowledge Graph, and Freebase.
Combine with scheduling -- Use Apify's scheduler to run the same query weekly or daily and track how article word counts and content evolve over time.
Start with fewer results -- Begin with 10-20 results to verify your query returns relevant articles before scaling up to 500.

Output

Output example (with summary enrichment)

Each article is returned as a flat JSON object in the Apify dataset:

{
    "title": "Quantum computing",
    "pageId": 25202,
    "description": "Computation using quantum-mechanical phenomena",
    "extract": "A quantum computer is a computer that exploits quantum mechanical phenomena. On small scales, physical matter exhibits properties of both particles and waves, and quantum computing leverages this behavior using specialized hardware. Classical physics cannot explain the operation of these quantum devices, and a scalable quantum computer could perform some calculations exponentially faster than any modern classical computer.",
    "snippet": "A quantum computer is a computer that exploits quantum mechanical phenomena. On small scales, physical matter exhibits properties of both particles and waves...",
    "wordCount": 12485,
    "sizeBytes": 198234,
    "timestamp": "2025-12-14T08:22:35Z",
    "wikidataId": "Q3552",
    "articleUrl": "https://en.wikipedia.org/wiki/Quantum_computing",
    "thumbnailUrl": "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Bloch_sphere.svg/320px-Bloch_sphere.svg.png",
    "extractedAt": "2026-02-17T14:30:12.456Z"
}

Output example (without summary enrichment)

When includeSummary is set to false, the enrichment fields are null but the core search metadata is always present:

{
    "title": "Quantum computing",
    "pageId": 25202,
    "description": null,
    "extract": null,
    "snippet": "A quantum computer is a computer that exploits quantum mechanical phenomena...",
    "wordCount": 12485,
    "sizeBytes": 198234,
    "timestamp": "2025-12-14T08:22:35Z",
    "wikidataId": null,
    "articleUrl": "https://en.wikipedia.org/wiki/Quantum_computing",
    "thumbnailUrl": null,
    "extractedAt": "2026-02-17T14:30:12.456Z"
}

Output fields

Field	Type	Description
`title`	String	Article title as displayed on Wikipedia
`pageId`	Integer	Unique Wikipedia page identifier
`description`	String or null	Short article description from Wikidata (requires `includeSummary`)
`extract`	String or null	Plain-text summary of the article introduction (requires `includeSummary`)
`snippet`	String	Search-relevant excerpt with query term context, HTML stripped
`wordCount`	Integer	Total word count of the full article
`sizeBytes`	Integer	Article page size in bytes
`timestamp`	String	ISO 8601 timestamp of the article's last edit
`wikidataId`	String or null	Wikidata entity ID for cross-language linking (requires `includeSummary`)
`articleUrl`	String	Canonical URL to the Wikipedia article
`thumbnailUrl`	String or null	URL to the article's lead thumbnail image (requires `includeSummary`)
`extractedAt`	String	ISO 8601 timestamp of when the data was extracted by this actor

Use cases

Academic research -- Quickly gather structured metadata on hundreds of Wikipedia articles for a literature survey, content analysis, or knowledge mapping project. Export word counts and timestamps to analyze article maturity and editorial activity across topics.
SEO and content strategy -- Identify high-authority Wikipedia pages related to your target keywords. Use word counts and article sizes to gauge topic depth and find content gaps that your own articles could fill. Compare article coverage across multiple keywords to prioritize content creation.
Knowledge base construction -- Use Wikidata IDs and article extracts as a foundation for building structured knowledge graphs, chatbot training data, FAQ databases, or reference systems. The Wikidata IDs enable linking to external datasets like DBpedia and Google Knowledge Graph.
Multilingual content analysis -- Compare how topics are covered across different Wikipedia language editions. Track which articles exist in which languages and measure their relative depth by word count. Useful for localization teams and international content strategies.
Trend monitoring -- Schedule recurring searches on current event topics to track when new articles appear, how quickly they grow in word count, and when they stabilize after initial creation. Combine with the Website Change Monitor actor for deeper tracking.
Data enrichment -- Feed Wikipedia summaries, descriptions, and thumbnails into existing datasets to add context to company names, scientific terms, geographic locations, or historical events in your applications. The flat JSON output makes it easy to join with existing data.

API & Integration

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("CkESJHQpPf1x2RL68").call(run_input={
    "query": "renewable energy",
    "language": "en",
    "includeSummary": True,
    "maxResults": 50,
})

# Iterate over results and process each article
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} — {item['wordCount']} words")
    print(f"  URL: {item['articleUrl']}")
    print(f"  Description: {item['description']}")
    print(f"  Wikidata: {item['wikidataId']}")
    print()

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("CkESJHQpPf1x2RL68").call({
    query: "renewable energy",
    language: "en",
    includeSummary: true,
    maxResults: 50,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.log(`${item.title} — ${item.wordCount} words`);
    console.log(`  URL: ${item.articleUrl}`);
    console.log(`  Description: ${item.description}`);
    console.log(`  Wikidata: ${item.wikidataId}`);
    console.log();
});

cURL

# Start the actor run
curl "https://api.apify.com/v2/acts/CkESJHQpPf1x2RL68/runs" \
    -X POST \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_TOKEN" \
    -d '{
        "query": "renewable energy",
        "language": "en",
        "includeSummary": true,
        "maxResults": 50
    }'

# Retrieve results from the dataset (after the run completes)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json" \
    -H "Authorization: Bearer YOUR_API_TOKEN"

Integrations

Wikipedia Article Search outputs clean, structured JSON that integrates seamlessly with Apify's ecosystem and third-party tools:

Zapier -- Trigger workflows in 5,000+ apps whenever a Wikipedia search run completes. Automatically route new results to your CRM, project management tool, or notification system.
Make (Integromat) -- Build multi-step automations that process Wikipedia data alongside other sources. Combine with other Apify actors for enriched research pipelines.
Google Sheets -- Export results directly to a spreadsheet for collaborative review and filtering using Apify's built-in integration. Ideal for team research projects.
Webhooks -- Receive HTTP POST notifications when runs finish, enabling real-time data pipelines and event-driven architectures.
Slack / Email -- Set up alerts to notify your team when new results match specific criteria or when scheduled searches detect changes.
Apify API -- Access results programmatically in any language via the REST API or official client libraries for JavaScript, Python, and PHP. Supports both synchronous and asynchronous execution patterns.

How it works

The actor uses a two-step architecture to combine search relevance with rich article metadata:

Input Query --> MediaWiki Search API --> Result List --> REST API Enrichment --> Structured Output
                (/w/api.php)              (up to 500)    (/api/rest_v1/)         (Apify Dataset)

Step 1: Search phase

The actor sends your query to the MediaWiki Action API endpoint (/w/api.php?action=query&list=search) on the selected language edition of Wikipedia. The API performs a full-text search across all articles in that edition and returns a ranked list of matching results. Each result includes basic metadata: article title, page ID, word count, page size in bytes, last-edit timestamp, and a search snippet showing where your query terms appear.

Step 2: Enrichment phase (optional)

If includeSummary is enabled, the actor iterates through each search result and calls the Wikipedia REST API (/api/rest_v1/page/summary/{title}) to fetch additional data for that article. This enrichment adds: a plain-text extract of the article's introductory section, a short Wikidata description, the Wikidata entity ID, a thumbnail image URL, and the canonical article URL. A 200ms delay between requests ensures compliance with Wikipedia's rate-limit policies.

Step 3: Data cleaning

Search snippets from the MediaWiki API contain raw HTML tags and encoded entities. The actor cleans these using a built-in stripHtml() function that removes all HTML tags and decodes common entities including ", &, <, >, and '. This ensures that every snippet field in the output contains clean, readable plain text.

Step 4: Output

Each article is pushed to the Apify dataset as a flat JSON object with all 12 fields. The dataset is immediately available for export in JSON, CSV, Excel, or XML format through the Apify Console. You can also access it programmatically via the Apify API or stream it into connected integrations.

The actor identifies itself with the User-Agent string ApifyWikipediaSearch/1.0 as required by the Wikimedia API policy. If any individual summary enrichment request fails (e.g., due to a network timeout or a redirect page), the actor gracefully skips that enrichment and continues processing remaining results.

Performance & cost

Wikipedia Article Search uses minimal compute resources (256 MB memory). The primary factor affecting run time is whether summary enrichment is enabled, since each article requires an additional API call with a 200ms throttle delay.

Scenario	Results	Est. time	Est. cost
Quick search, no summaries	20	~3 seconds	~$0.001
Standard search with summaries	20	~10 seconds	~$0.001
Medium batch with summaries	100	~30 seconds	~$0.002
Large batch with summaries	500	~2 minutes	~$0.005
Large batch, no summaries	500	~5 seconds	~$0.001

The Wikipedia API itself is completely free with no usage fees or rate-limit charges. Your only cost is Apify platform compute time, making this actor one of the most cost-efficient data extraction tools available. Free-tier Apify accounts include $5 of monthly platform credits, which is enough for thousands of Wikipedia searches. Even at maximum scale (500 articles with summaries), a single run costs less than one cent.

Limitations

500 results per run -- The Wikipedia search API returns a maximum of 500 results per query. This is a hard limit imposed by the MediaWiki API. For broader coverage, run multiple searches with different query variations or narrower subtopics.
No full article text -- The actor returns article summaries (introduction paragraphs), not the complete article body. The extract field typically contains the first 1-3 paragraphs. For full content extraction, consider a dedicated web scraping approach.
Search relevance depends on Wikipedia -- Result ranking is controlled by Wikipedia's internal CirrusSearch algorithm. The actor does not re-rank, deduplicate, or filter results beyond what the API returns.
Language editions vary in coverage -- Smaller language Wikipedias may have fewer articles or less detailed content on certain topics compared to the English edition, which has over 6.8 million articles.
Summary enrichment adds latency -- The 200ms throttle delay per article means 500 results with summaries takes approximately 2 minutes. Disable summaries if speed is critical for your workflow.
No disambiguation handling -- If your query matches a disambiguation page, it will appear in results like any other article. Check the description field to identify disambiguation pages (they typically say "Wikimedia disambiguation page").
Thumbnails not always available -- Not every Wikipedia article has a lead image. Stub articles, lists, and newly created pages often return null for thumbnailUrl.
Single query per run -- Each run processes one search query. To search for multiple different topics, you need to start separate runs for each query.

Responsible use

This actor accesses Wikipedia's free, public API and follows Wikimedia's API etiquette guidelines. Please use it responsibly:

Rate limiting -- The built-in 200ms delay between enrichment requests prevents excessive load on Wikipedia's servers, staying well within Wikimedia's recommended request rates for automated tools.
User-Agent identification -- Every request includes the ApifyWikipediaSearch/1.0 User-Agent header as required by the Wikimedia API usage policy. This allows Wikimedia to identify and contact the operator if needed.
Content licensing -- Wikipedia content is licensed under Creative Commons Attribution-ShareAlike 4.0. If you republish article extracts or descriptions, provide proper attribution and comply with the license terms. Metadata fields like wordCount and pageId are factual data not subject to copyright.
No scraping of full articles -- This actor uses official API endpoints rather than scraping rendered HTML pages, which aligns with Wikimedia's preferred access method for programmatic data retrieval.
Minimal data footprint -- The actor retrieves only metadata and summaries, not full article HTML, keeping data transfer and storage requirements low.
Fair scheduling -- If you set up recurring scheduled runs, use reasonable intervals (daily or weekly rather than hourly) to avoid unnecessary load on the Wikipedia API infrastructure.

FAQ

Do I need a Wikipedia API key? No. Wikipedia's API is free and open. No API key, token, or registration is needed. You only need an Apify account to run the actor on the platform.

What is the difference between snippet and extract? The snippet is a short, search-relevant excerpt returned by the MediaWiki Search API. It shows where your query terms appear in the article text and is always available. The extract is a longer, human-readable summary of the article's introductory section, fetched from the Wikipedia REST API. The extract is only available when includeSummary is enabled.

Can I search non-English Wikipedia editions? Yes. The actor supports 15 language editions: English, German, French, Spanish, Japanese, Chinese, Russian, Portuguese, Italian, Arabic, Dutch, Korean, Polish, Swedish, and Vietnamese. Select your language in the input settings.

How do Wikidata IDs work? Each Wikipedia article is linked to a Wikidata entity (e.g., Q3552 for "Quantum computing"). This ID is universal across all Wikipedia language editions and connects to the broader Wikidata knowledge base. You can use it to find the same topic in other languages or to link Wikipedia data with other structured datasets like DBpedia.

Can I use this data commercially? Wikipedia content is licensed under CC BY-SA 4.0. You may use it commercially as long as you provide attribution and share derivative works under the same license. Always review the Wikimedia Terms of Use for your specific use case.

What happens if an article has no thumbnail? The thumbnailUrl field will be null. Not all Wikipedia articles have a lead image. This is common for stub articles, lists, and recently created pages.

How do I get more than 500 results for a topic? Run the actor multiple times with different query variations. For example, instead of one search for "artificial intelligence", try separate searches for "machine learning", "deep learning", "neural networks", and "natural language processing" to cover the topic more broadly.

Does the actor handle redirects and disambiguation pages? Redirect pages are resolved automatically by the MediaWiki Search API, so you will receive the target article rather than the redirect itself. Disambiguation pages, however, appear as regular search results. You can identify them by checking the description field, which typically contains "Wikimedia disambiguation page" for these entries.

What is the sizeBytes field useful for? The sizeBytes field represents the total size of the article's wiki markup source in bytes. It is a rough proxy for article depth and detail. Articles with higher byte counts generally contain more sections, references, and inline content. This can be useful for filtering out stub articles or identifying the most comprehensive articles on a topic.

Actor	Description	Link
OpenAlex Research Paper Search	Search 250M+ academic papers from the OpenAlex database	View on Apify
Semantic Scholar Paper Search	Find academic papers with citation data and AI-generated summaries	View on Apify
Crossref Academic Paper Search	Search scholarly articles by DOI, author, title, or keyword	View on Apify
Internet Archive Search	Search the Internet Archive's collections of books, audio, video, and more	View on Apify
Wayback Machine Search	Look up historical snapshots of any web page over time	View on Apify
DBLP Publication Search	Search computer science publications from the DBLP bibliography	View on Apify
PubMed Biomedical Literature Search	Search biomedical and life science articles from the PubMed database	View on Apify
ArXiv Preprint Paper Search	Search preprint papers across physics, math, CS, and more from ArXiv	View on Apify
CORE Open Access Papers	Search 300M+ open access research papers and articles	View on Apify
Europe PMC Literature Search	Search European life science literature and abstracts	View on Apify

Wikipedia Article Extractor (AI-ready)

changeable_acacia/wikipedia-article-extractor-ai-ready

Extracts clean JSON from any Wikipedia article for AI/RAG use.

SABYASACHI TRIPATHY

Wikipedia Scraper

automation-lab/wikipedia-scraper

Search and extract Wikipedia articles — titles, summaries, full content, categories, and images. Uses the free MediaWiki API.

Stas Persiianenko

Wikipedia MCP Server

agentify/wikipedia-mcp-server

MCP server for Wikipedia, providing LLMs and clients with real-time access to Wikipedia articles, summaries, sections, and related information via Apify Actor.

agentify

Wikipedia Scraper | $5 / 1k | Fast & Reliable

fatihtahta/wikipedia-scraper

Get full articles and detailed search results with the Wikipedia Scraper. Extract structured data including titles, summaries, citations, and full content. Ideal for market research, AI training, and competitive intelligence.

Fatih Tahta

Wikipedia-scraper

pluzgi/wikipedia-scraper

The scraper searches Wikipedia for a given term, extracts the titles and URLs of search results, and retrieves the last modification date from each page.

pluzgi

Wikipedia Email Scraper - Advanced, Fast & Cheapest

contacts-api/wikipedia-email-scraper-fast-advanced-and-cheapest

📚 Wikipedia Email Scraper allows you to collect publicly available editor and organization emails from Wikipedia pages 🔎 Great for research and academic outreach 📧

Lead Heaven

Wikipedia Search & Content Scraper

tuningsearch/wikipedia-search-scraper

🔥 Only $0.5 per 1,000 results 🔥 **CHEAPEST** Wikipedia Search + Full Page Scraper! 🔍 Search 100 results per query across 70 languages 📄 Extract complete page content in Markdown format ⚡ Lightning-fast batch processing with zero failure charges!

tuningsearch

Wikipedia Trending

viralanalyzer/wikipedia-trending

Scrape Wikipedia trending articles, page views, and edit history. Track popular topics and analyze public interest trends.

viralanalyzer

5.0

Wikipedia Scraper

nexgendata/wikipedia-scraper

Scrape Wikipedia articles, infoboxes, references, and structured data. Extract knowledge base content for research, NLP training, and data enrichment.

Stephan Corbeil

Wikipedia Search Ai

desmond-dev/wikipedia-search-ai

The most reliable way to get facts for your AI Agents. Searches Wikipedia, extracts clean summaries (first 500 chars), and returns source URLs. Perfect for grounding LLMs and RAG pipelines. Zero rate limits.