Wikipedia_Scraper
Pricing
$4.99/month + usage
Go to Apify Store

Wikipedia_Scraper
Fetch Wikipedia page content for multiple URLs with optional keyword filtering. Returns structured JSON results and handles bulk requests asynchronously.
Pricing
$4.99/month + usage
Rating
0.0
(0)
Developer

ZeroBreak
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Wikipedia Scraper
A production-ready Apify Actor for scraping structured Wikipedia data using direct URLs or keyword-based page scraping.
This actor communicates with a backend Wikipedia Scraper API and is suitable for Apify Cloud, AWS Lambda, or any HTTP-based automation workflow.
Features
- Fetch Wikipedia pages for single or multiple URLs.
- Optional keyword search to detect specific terms in page content.
- Accepts input as plain URLs or objects with a
urlkey. - Returns structured JSON results including:
- HTTP status
- Request details
- Content snippet or full data
- Errors if any
- Fully asynchronous for high performance.
- Automatically pushes results to the Apify Dataset.
- Handles exceptions gracefully without stopping execution.
Input
The Actor accepts input in JSON format:
Example 1: List of URLs
{"urls": ["https://en.wikipedia.org/wiki/JavaScript","https://en.wikipedia.org/wiki/Bangladesh"],"keyword": "html"}
Run the Actor
- Save your input configuration
- Click Run in Apify Console
- Monitor logs for progress and errors
Access Results
- All scraped data is stored in the Apify Dataset.
- Each URL produces a separate dataset item.
- Data is returned in structured JSON format.
Input
queries (optional)
List of keyword searches to discover Pinterest pins.
```json{"urls": ["https://en.wikipedia.org/wiki/JavaScript",],"keyword": "html"}
How It Works
- Reads input from Apify (urls and optional keyword)
- Fetches Wikipedia pages asynchronously
- Parses page content: title, first paragraph, and sections
- Checks for keyword presence (if provided)
- Pushes structured JSON results to the dataset
- Logs errors and handles retries automatically
Error Handling
- Automatic retries for temporary failures
- Invalid or non-Wikipedia URLs are safely skipped
- Clear error messages are logged via Actor.log
Use Cases
- Research and collect Wikipedia content
- Build datasets for machine learning or NLP
- Keyword trend analysis across Wikipedia
- Academic or reference data collection
- Quick access to structured page information
Support
- Extend or customize this Actor for your workflow
- Logs and dataset entries help debug and monitor scraping