Wikipedia_Scraper avatar
Wikipedia_Scraper

Pricing

$4.99/month + usage

Go to Apify Store
Wikipedia_Scraper

Wikipedia_Scraper

Fetch Wikipedia page content for multiple URLs with optional keyword filtering. Returns structured JSON results and handles bulk requests asynchronously.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Wikipedia Scraper

A production-ready Apify Actor for scraping structured Wikipedia data using direct URLs or keyword-based page scraping.

This actor communicates with a backend Wikipedia Scraper API and is suitable for Apify Cloud, AWS Lambda, or any HTTP-based automation workflow.


Features

  • Fetch Wikipedia pages for single or multiple URLs.
  • Optional keyword search to detect specific terms in page content.
  • Accepts input as plain URLs or objects with a url key.
  • Returns structured JSON results including:
    • HTTP status
    • Request details
    • Content snippet or full data
    • Errors if any
  • Fully asynchronous for high performance.
  • Automatically pushes results to the Apify Dataset.
  • Handles exceptions gracefully without stopping execution.

Input

The Actor accepts input in JSON format:

Example 1: List of URLs

{
"urls": [
"https://en.wikipedia.org/wiki/JavaScript",
"https://en.wikipedia.org/wiki/Bangladesh"
],
"keyword": "html"
}



Run the Actor

  • Save your input configuration
  • Click Run in Apify Console
  • Monitor logs for progress and errors

Access Results

  • All scraped data is stored in the Apify Dataset.
  • Each URL produces a separate dataset item.
  • Data is returned in structured JSON format.

Input

queries (optional)

List of keyword searches to discover Pinterest pins.

```json
{
"urls": [
"https://en.wikipedia.org/wiki/JavaScript",
],
"keyword": "html"
}

How It Works

  • Reads input from Apify (urls and optional keyword)
  • Fetches Wikipedia pages asynchronously
  • Parses page content: title, first paragraph, and sections
  • Checks for keyword presence (if provided)
  • Pushes structured JSON results to the dataset
  • Logs errors and handles retries automatically

Error Handling

  • Automatic retries for temporary failures
  • Invalid or non-Wikipedia URLs are safely skipped
  • Clear error messages are logged via Actor.log

Use Cases

  • Research and collect Wikipedia content
  • Build datasets for machine learning or NLP
  • Keyword trend analysis across Wikipedia
  • Academic or reference data collection
  • Quick access to structured page information

Support

  • Extend or customize this Actor for your workflow
  • Logs and dataset entries help debug and monitor scraping