WordPress Posts Scraper - Extract Articles & Metadata
Pricing
$10.00 / 1,000 results
WordPress Posts Scraper - Extract Articles & Metadata
Extract posts, articles, and metadata from any WordPress site using REST API. 20+ filters: date ranges, categories, tags, 0authors, search keywords. Get title, content, author bio, featured images & more. No WordPress account needed. Fast, reliable data extraction for content aggregation & research.
Pricing
$10.00 / 1,000 results
Rating
0.0
(0)
Developer

DevnaZ
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
3 days ago
Last modified
Categories
Share
WordPress Posts Scraper
The WordPress Posts Scraper is an Apify actor that extracts posts and metadata from any WordPress website using the WordPress REST API. It automatically handles pagination and fetches additional information like author details, categories, tags, and featured images.
This actor is perfect for researchers, content aggregators, and developers who need structured data from WordPress sites.
How It Works
- You provide one or more WordPress site URLs.
- The actor checks if the WordPress REST API is available.
- It fetches posts with your specified filters (dates, categories, keywords, etc.).
- Handles pagination automatically until all posts are retrieved.
- Extracts metadata such as author name, categories, tags, and featured images.
- Returns structured JSON output with all relevant post details.
Features
✅ Fetches posts from any WordPress site using REST API ✅ Supports pagination until all posts are retrieved ✅ 20+ advanced filters: date ranges, categories, tags, author, search keywords, status, and more ✅ Extracts metadata like author bio, categories, tags, and featured images ✅ Configurable sorting (by date, modified, title, author, relevance) ✅ Optional proxy support (not required for most sites) ✅ Clean and structured JSON output ✅ No WordPress account required
Getting Started
1. Input Parameters
To use the scraper, provide the following inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
startUrls | Array | ✅ | List of WordPress site URLs to scrape (e.g., [{"url": "https://techcrunch.com"}]) |
maxPosts | Integer | ❌ | Maximum total posts to extract per site (default: 5, max: 10000) |
perPage | Integer | ❌ | [Advanced] Posts per API request (default: 50, max: 100). Higher = fewer requests = lower cost. Reduce to 10-20 if timeouts occur. |
searchKeyword | String | ❌ | Filter posts by keyword search |
after | String | ❌ | Posts published after this date (ISO8601: 2025-01-01T00:00:00) |
before | String | ❌ | Posts published before this date (ISO8601: 2025-12-31T23:59:59) |
modifiedAfter | String | ❌ | Posts modified after this date (ISO8601) |
modifiedBefore | String | ❌ | Posts modified before this date (ISO8601) |
categories | Array | ❌ | Filter by category IDs (e.g., ["1", "5", "12"]) |
categoriesExclude | Array | ❌ | Exclude specific category IDs |
tags | Array | ❌ | Filter by tag IDs |
tagsExclude | Array | ❌ | Exclude specific tag IDs |
author | Array | ❌ | Filter by author IDs |
authorExclude | Array | ❌ | Exclude specific author IDs |
status | String | ❌ | Post status: publish, draft, pending, private, future (default: publish) |
orderBy | String | ❌ | Sort by: date, modified, title, author, id, relevance (default: date) |
order | String | ❌ | Sort order: asc or desc (default: desc) |
sticky | Boolean | ❌ | Include only sticky posts (default: false) |
slug | String | ❌ | Filter by specific post slug |
offset | Integer | ❌ | Skip a specific number of posts (default: 0) |
proxyConfiguration | Object | ❌ | Proxy settings (optional - not needed for most WordPress sites) |
2. Running the Actor
Using Apify Interface
- Navigate to the actor's Apify page.
- Enter the required parameters.
- Click Run and wait for the data to be scraped.
Using Apify API
curl -X POST -H "Content-Type: application/json" \-d '{"startUrls": [{"url": "https://techcrunch.com"}],"maxPosts": 50,"after": "2025-01-01T00:00:00","orderBy": "date","order": "desc"}' \"https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN"
Output Format
The output is a JSON dataset containing structured post details:
[{"id": 19263,"date": "2025-11-04T15:34:27","modified": "2025-11-04T16:08:02","slug": "wordpress-6-9-beta-3","link": "https://wordpress.org/news/2025/11/wordpress-6-9-beta-3/","title": "WordPress 6.9 Beta 3","content": "<p>WordPress 6.9 Beta 3 is available for download and testing!</p>...","excerpt": "<p>WordPress 6.9 Beta 3 is available for download and testing!...</p>","author": "Amy Kamala","categories": ["Development", "General", "Releases"],"tags": ["6.9", "development", "release"],"featured_image": "https://wordpress.org/wp-content/uploads/featured.jpg","extra_metadata": {"author_bio": "Full Stack Dev, Artist, Masters from UCLA","author_url": "https://kittenkamala.com/","category_description": "Development news and updates"}}]
Use Cases
- Content Aggregation – Collect and analyze posts from different WordPress sites.
- SEO Research – Extract content and metadata for SEO analysis.
- Data Science – Gather datasets for NLP or sentiment analysis.
- Backup and Archiving – Store blog content for future reference.
- Competitor Monitoring – Track competitor blog posts and content strategies.
- Research & Analysis – Extract posts by date range, category, or keyword for academic or business research.
Performance & Cost Optimization
Speed & Reliability
- Speed: ~2-5 seconds per 50 posts (using REST API)
- Success rate: 99%+ on WordPress sites with REST API enabled
- Concurrency: Supports multiple sites simultaneously
- No proxy required: WordPress REST API is public and doesn't require proxies in most cases
Cost Optimization with perPage Parameter
The perPage parameter controls how many posts are fetched per API request, directly impacting cost and speed:
Example: Extracting 100 posts
| perPage | API Requests | Compute Units | Speed | Notes |
|---|---|---|---|---|
| 10 | 10 requests | Higher cost | Slower | Use if large sites timeout |
| 50 (default) | 2 requests | Lower cost | Faster | Recommended - best balance |
| 100 | 1 request | Lowest cost | Fastest | May timeout on large sites (TechCrunch, etc.) |
Recommendation:
- Default (50): Works on most sites, good balance between cost and reliability
- Large sites (TechCrunch, Wired, etc.): If timeouts occur, reduce to
perPage: 20-30 - Small sites: Increase to
perPage: 100for maximum speed and lowest cost
Notes
- WordPress REST API required: This actor only works with sites that have the WordPress REST API enabled (enabled by default on most WordPress sites).
- API not available?: If a site has disabled the REST API, the actor will return an error message.
- Category/Tag IDs: To filter by categories or tags, you need the numeric IDs (not names). You can find these in the WordPress admin or via the API endpoints:
- Categories:
https://yoursite.com/wp-json/wp/v2/categories - Tags:
https://yoursite.com/wp-json/wp/v2/tags
- Categories:
- Date format: Use ISO8601 format for date filters (e.g.,
2025-01-01T00:00:00)
Support & Troubleshooting
Having issues? Check these common solutions:
- Timeout errors (large sites like TechCrunch): Reduce the
perPageparameter to 20-30. This makes more API requests but prevents timeouts. - WordPress REST API not available: The site may have disabled the REST API. Verify by visiting
https://yoursite.com/wp-json/wp/v2/postsin your browser. - No posts returned: Check your filters - they may be too restrictive (e.g., date range with no matching posts).
- Missing author data: Some WordPress sites may not include author information in the
_embeddedresponse. - Category/Tag filtering not working: Ensure you're using numeric IDs, not names.
- High costs: Increase
perPageto 80-100 for small/fast sites to reduce API requests and compute units.
For bugs or feature requests, feel free to contact support. Happy scraping! 🚀
No WordPress account or subscription required. Get started analyzing WordPress content today!