Pricing

$10.00 / 1,000 results

WordPress Posts Scraper - Extract Articles & Metadata

Extract posts, articles, and metadata from any WordPress site using REST API. 20+ filters: date ranges, categories, tags, 0authors, search keywords. Get title, content, author bio, featured images & more. No WordPress account needed. Fast, reliable data extraction for content aggregation & research.

Pricing

$10.00 / 1,000 results

Rating

0.0

(0)

Developer

DevnaZ

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

WordPress Posts Scraper

The WordPress Posts Scraper is an Apify actor that extracts posts and metadata from any WordPress website using the WordPress REST API. It automatically handles pagination and fetches additional information like author details, categories, tags, and featured images.

This actor is perfect for researchers, content aggregators, and developers who need structured data from WordPress sites.

How It Works

You provide one or more WordPress site URLs.
The actor checks if the WordPress REST API is available.
It fetches posts with your specified filters (dates, categories, keywords, etc.).
Handles pagination automatically until all posts are retrieved.
Extracts metadata such as author name, categories, tags, and featured images.
Returns structured JSON output with all relevant post details.

Features

✅ Fetches posts from any WordPress site using REST API ✅ Supports pagination until all posts are retrieved ✅ 20+ advanced filters: date ranges, categories, tags, author, search keywords, status, and more ✅ Extracts metadata like author bio, categories, tags, and featured images ✅ Configurable sorting (by date, modified, title, author, relevance) ✅ Optional proxy support (not required for most sites) ✅ Clean and structured JSON output ✅ No WordPress account required

Getting Started

1. Input Parameters

To use the scraper, provide the following inputs:

Parameter	Type	Required	Description
`startUrls`	Array	✅	List of WordPress site URLs to scrape (e.g., `[{"url": "https://techcrunch.com"}]`)
`maxPosts`	Integer	❌	Maximum total posts to extract per site (default: 5, max: 10000)
`perPage`	Integer	❌	[Advanced] Posts per API request (default: 50, max: 100). Higher = fewer requests = lower cost. Reduce to 10-20 if timeouts occur.
`searchKeyword`	String	❌	Filter posts by keyword search
`after`	String	❌	Posts published after this date (ISO8601: `2025-01-01T00:00:00`)
`before`	String	❌	Posts published before this date (ISO8601: `2025-12-31T23:59:59`)
`modifiedAfter`	String	❌	Posts modified after this date (ISO8601)
`modifiedBefore`	String	❌	Posts modified before this date (ISO8601)
`categories`	Array	❌	Filter by category IDs (e.g., `["1", "5", "12"]`)
`categoriesExclude`	Array	❌	Exclude specific category IDs
`tags`	Array	❌	Filter by tag IDs
`tagsExclude`	Array	❌	Exclude specific tag IDs
`author`	Array	❌	Filter by author IDs
`authorExclude`	Array	❌	Exclude specific author IDs
`status`	String	❌	Post status: `publish`, `draft`, `pending`, `private`, `future` (default: `publish`)
`orderBy`	String	❌	Sort by: `date`, `modified`, `title`, `author`, `id`, `relevance` (default: `date`)
`order`	String	❌	Sort order: `asc` or `desc` (default: `desc`)
`sticky`	Boolean	❌	Include only sticky posts (default: false)
`slug`	String	❌	Filter by specific post slug
`offset`	Integer	❌	Skip a specific number of posts (default: 0)
`proxyConfiguration`	Object	❌	Proxy settings (optional - not needed for most WordPress sites)

2. Running the Actor

Using Apify Interface

Navigate to the actor's Apify page.
Enter the required parameters.
Click Run and wait for the data to be scraped.

Using Apify API

curl -X POST -H "Content-Type: application/json" \
     -d '{
       "startUrls": [{"url": "https://techcrunch.com"}],
       "maxPosts": 50,
       "after": "2025-01-01T00:00:00",
       "orderBy": "date",
       "order": "desc"
     }' \
     "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN"

Output Format

The output is a JSON dataset containing structured post details:

[
  {
    "id": 19263,
    "date": "2025-11-04T15:34:27",
    "modified": "2025-11-04T16:08:02",
    "slug": "wordpress-6-9-beta-3",
    "link": "https://wordpress.org/news/2025/11/wordpress-6-9-beta-3/",
    "title": "WordPress 6.9 Beta 3",
    "content": "<p>WordPress 6.9 Beta 3 is available for download and testing!</p>...",
    "excerpt": "<p>WordPress 6.9 Beta 3 is available for download and testing!...</p>",
    "author": "Amy Kamala",
    "categories": ["Development", "General", "Releases"],
    "tags": ["6.9", "development", "release"],
    "featured_image": "https://wordpress.org/wp-content/uploads/featured.jpg",
    "extra_metadata": {
      "author_bio": "Full Stack Dev, Artist, Masters from UCLA",
      "author_url": "https://kittenkamala.com/",
      "category_description": "Development news and updates"
    }
  }
]

Use Cases

Content Aggregation – Collect and analyze posts from different WordPress sites.
SEO Research – Extract content and metadata for SEO analysis.
Data Science – Gather datasets for NLP or sentiment analysis.
Backup and Archiving – Store blog content for future reference.
Competitor Monitoring – Track competitor blog posts and content strategies.
Research & Analysis – Extract posts by date range, category, or keyword for academic or business research.

Performance & Cost Optimization

Speed & Reliability

Speed: ~2-5 seconds per 50 posts (using REST API)
Success rate: 99%+ on WordPress sites with REST API enabled
Concurrency: Supports multiple sites simultaneously
No proxy required: WordPress REST API is public and doesn't require proxies in most cases

Cost Optimization with `perPage` Parameter

The perPage parameter controls how many posts are fetched per API request, directly impacting cost and speed:

Example: Extracting 100 posts

perPage	API Requests	Compute Units	Speed	Notes
10	10 requests	Higher cost	Slower	Use if large sites timeout
50 (default)	2 requests	Lower cost	Faster	Recommended - best balance
100	1 request	Lowest cost	Fastest	May timeout on large sites (TechCrunch, etc.)

Recommendation:

Default (50): Works on most sites, good balance between cost and reliability
Large sites (TechCrunch, Wired, etc.): If timeouts occur, reduce to perPage: 20-30
Small sites: Increase to perPage: 100 for maximum speed and lowest cost

Notes

WordPress REST API required: This actor only works with sites that have the WordPress REST API enabled (enabled by default on most WordPress sites).
API not available?: If a site has disabled the REST API, the actor will return an error message.
Category/Tag IDs: To filter by categories or tags, you need the numeric IDs (not names). You can find these in the WordPress admin or via the API endpoints:
- Categories: https://yoursite.com/wp-json/wp/v2/categories
- Tags: https://yoursite.com/wp-json/wp/v2/tags
Date format: Use ISO8601 format for date filters (e.g., 2025-01-01T00:00:00)

Support & Troubleshooting

Having issues? Check these common solutions:

Timeout errors (large sites like TechCrunch): Reduce the perPage parameter to 20-30. This makes more API requests but prevents timeouts.
WordPress REST API not available: The site may have disabled the REST API. Verify by visiting https://yoursite.com/wp-json/wp/v2/posts in your browser.
No posts returned: Check your filters - they may be too restrictive (e.g., date range with no matching posts).
Missing author data: Some WordPress sites may not include author information in the _embedded response.
Category/Tag filtering not working: Ensure you're using numeric IDs, not names.
High costs: Increase perPage to 80-100 for small/fast sites to reduce API requests and compute units.

For bugs or feature requests, feel free to contact support. Happy scraping! 🚀

No WordPress account or subscription required. Get started analyzing WordPress content today!

WordPress Articles Scraper

extremescrapes/wordpress-articles-scraper

The WordPress Articles Scraper is an Apify actor that extracts posts and metadata from any WordPress website using the WordPress REST API. It automatically handles pagination and fetches additional information like author details, categories, tags, and featured images.

Extreme Scrapes

119

✨ WordPress Content Extractor

ramman/wordpress-content-extractor

🔍Easily scrape and export posts, pages, metadata, images, and comments from any WordPress site. ✨ WordPress content to JSON, CSV, or TXT — instantly.

ramman

Wordpress Content Extractor

simplifysme/wordpress-content-extractor

📝 Extract complete content from WordPress sites including posts, categories, and metadata. Perfect for content migration, blog aggregation, and CMS integration.

SimplifySME Toolbox

WordPress Scraper

jupri/wordpress

💫 Scrape WordPress and Woocommerce websites

cat

416

Wordpress Email Scraper - Advanced, Fast & Cheapest

contacts-api/wordpress-email-scraper-fast-advanced-and-cheapest

🌐 WordPress Email Scraper finds emails from WordPress websites, blogs, and author pages fast ⚡ Ideal for outreach, partnerships, and SEO campaigns 📧

Lead Heaven

Wordpress Phone Number Scraper

contacts-api/wordpress-phone-number-scraper

Collect contact numbers from WordPress websites with our WordPress Phone Number Scraper. Extract public phone numbers for sales and lead generation.

Lead Heaven

WordPress Email Scraper – Advanced, Cheapest & Reliable 📧⚡

contactminerlabs/wordpress-email-scraper---advanced-cheapest-reliable

🔍 Scrape WordPress Emails Enter your search parameters to collect verified contact emails from public WordPress profiles, along with profile title, bio, source URL & platform info ✉️📊 Perfect for lead generation, influencer outreach & data enrichment in tools like Google Sheets or CRMs⚡🧩

ContactMinerLabs

5.0

WordPress Integration - Auto Publisher

alizarin_refrigerator-owner/wordpress-integration

Automatically publish content to WordPress sites. Schedule posts, manage categories, upload media & sync with your content calendar. REST API & XML-RPC support.

The Howlers

Wordpress Detector

woundless_insurance/wordpress-detector

Bijay Puri

WordPress Universal Content Bridge

visita/wordpress-universal-content-bridge

The WordPress Universal Content Bridge is a specialized tool designed to solve the #1 problem in WordPress automation: Firewalls. Securely import AI articles, WooCommerce products, and Directory listings without getting blocked.