# WordPress Post Scraper (`hgservices/wordpress-post-scraper`) Actor

Extract every blog post from any WordPress site — title, content, date, author, image, categories and tags.

- **URL**: https://apify.com/hgservices/wordpress-post-scraper.md
- **Developed by:** [Harish Garg](https://apify.com/hgservices) (community)
- **Categories:** Automation, News, SEO tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $4.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

Extract every blog post from any WordPress website in minutes — no code, no logins, no setup, no plugins to install. **WordPress Post Scraper** turns any WordPress blog into clean, structured data (JSON, CSV, Excel, HTML or XML) you can drop straight into a spreadsheet, database, AI workflow or content pipeline. Built to handle sites that block ordinary scrapers.

### What does WordPress Post Scraper do?

WordPress Post Scraper **extracts posts from any public WordPress site**, including blogs hosted on wordpress.com and self-hosted WordPress installations. Paste a website URL, click **Start**, and the scraper returns a clean list of every post on the site with title, full content, publish date, author, featured image, categories and tags.

It works on millions of sites: news outlets, company blogs, product update pages, agency portfolios, personal blogs, niche publications and more — anywhere WordPress powers the content. The scraper is engineered to look and behave like a real visitor, so it works reliably on sites that block generic scraping tools. You can try it on [wordpress.org/news](https://wordpress.org/news) in seconds, with zero configuration.

### Why use WordPress Post Scraper?

WordPress runs more than **40% of the web**, which means most of the world's blog content lives on it. This scraper lets you turn that content into data for whatever you're building:

- **Competitor and market research** — monitor what your competitors publish, how often, and on which topics.
- **SEO and content analysis** — analyze keyword usage, posting frequency, internal linking and content strategy across entire sites.
- **AI training data and RAG pipelines** — bulk-collect high-quality long-form content to feed into LLM fine-tunes, embeddings, summarization tools or knowledge bases.
- **News and trend monitoring** — build automated news feeds, alerts and dashboards from industry publications.
- **Content migration and backup** — export an entire WordPress blog before a redesign, platform migration, or as a recurring offsite archive.
- **Lead generation** — extract author names, bylines and contact pages from target blogs.
- **Newsletter and social automation** — pull fresh posts into Make, Zapier, n8n, Google Sheets or your CMS on a schedule.

Because the scraper runs on Apify, you also get **scheduled runs**, a **REST API**, **webhooks**, **proxy rotation**, **dataset storage** and **integrations** with Make, Zapier, Google Drive, Slack, Airtable and more — all included.

### How to use WordPress Post Scraper

1. Click **Try for free** at the top of this page (you'll be asked to sign in — it's free).
2. Paste the homepage URL of the WordPress site you want to scrape into the **WordPress site URL** field (e.g. `https://wordpress.org/news`).
3. Optionally set **Maximum posts** to limit how many posts you want. Leave it at `0` to scrape the entire blog.
4. Click **Start**.
5. When the run finishes, open the **Output** tab and download the data as JSON, CSV, Excel, HTML or XML — or pull it via the Apify API.

That's it. No accounts, no API keys, no plugins, no installation on the target site.

### Input

You only need to provide a website URL. Everything else has a sensible default.

| Setting | What it does |
|---|---|
| **WordPress site URL** | The homepage of the WordPress blog you want to scrape. |
| **Maximum posts** | Total cap on posts returned. Use `0` to scrape the entire site. |
| **Posts per page** | How many posts to fetch per request. Higher = faster runs. |
| **Include full post content** | Turn off to get only titles, excerpts and metadata (smaller, cheaper output). |
| **Delay between requests** | Politeness delay so the target site stays happy. |
| **Proxy configuration** | Optional Apify Proxy — useful for sites that rate-limit cloud IPs. |

### Output

Each post is saved as one row in the dataset. Here's a real example:

```json
{
    "id": 18432,
    "title": "WordPress 6.7 \"Rollins\" Released",
    "slug": "wordpress-6-7-rollins",
    "url": "https://wordpress.org/news/2024/11/rollins/",
    "date": "2024-11-12T15:00:00",
    "modified": "2024-11-13T10:22:00",
    "excerpt": "Say hello to WordPress 6.7 \"Rollins\"...",
    "content": "<p>Full rendered HTML of the post body...</p>",
    "author": "WordPress Core Team",
    "featuredImage": "https://wordpress.org/news/files/2024/11/rollins.jpg",
    "categories": ["Releases"],
    "tags": ["wordpress-6-7"]
}
````

You can download the dataset in various formats such as **JSON, CSV, Excel, HTML or XML**, or query it directly through the Apify API and integrations.

### Data fields

| Field | Description |
|---|---|
| `id` | Unique WordPress post ID. |
| `title` | Post headline. |
| `slug` | URL-friendly post identifier. |
| `url` | Full public URL of the post. |
| `date` | When the post was originally published. |
| `modified` | When the post was last updated — perfect for change tracking. |
| `excerpt` | Short summary of the post. |
| `content` | Full HTML body of the post. |
| `author` | Display name of the post's author. |
| `featuredImage` | URL of the post's main image. |
| `categories` | List of WordPress categories the post belongs to. |
| `tags` | List of tags attached to the post. |

### How much does it cost to scrape WordPress posts?

WordPress Post Scraper is highly efficient — it uses a lightweight fetch strategy that needs far less compute than browser-based scrapers. In practice, **thousands of posts can be scraped well within the free tier's monthly platform credits**.

Costs grow with:

- The total number of posts you scrape.
- Whether you include the full post body (long articles take more storage).
- Whether you enable Apify Proxy (recommended for protected sites).

For an exact estimate before a big run, try a small `Maximum posts` value first and check the run summary.

### Tips for the best results

- **Start small.** Set `Maximum posts` to `10` on your first run to verify the site is supported before doing a full export.
- **Use Apify Proxy for protected sites.** If a site returns errors, switch the proxy on — most rate-limit issues disappear immediately.
- **Schedule recurring runs.** Use Apify's built-in scheduler to refresh your dataset daily, hourly, or whenever the target site publishes new content.
- **Connect to your stack.** Push results straight to Google Sheets, Airtable, Slack, your database or your CMS using Apify's no-code integrations.
- **Track changes over time.** The `modified` field lets you detect when existing posts are updated, not just when new ones are published.

### FAQ

**Which WordPress sites does this work on?**
Any public WordPress site where posts are visible to anonymous visitors — which is the default for both self-hosted WordPress and wordpress.com blogs. The vast majority of WordPress sites on the internet are supported out of the box, including many that block ordinary scraping tools.

**Do I need login credentials or an API key?**
No. The scraper only reads publicly available content, so no logins, tokens or plugins are required.

**Can I scrape an entire blog at once?**
Yes. Set **Maximum posts** to `0` and the scraper will paginate through the site until every post has been collected.

**Can I get just the most recent posts?**
Yes — set **Maximum posts** to however many recent posts you want. The scraper returns the newest posts first.

**Is web scraping WordPress sites legal?**
Scraping publicly available content is generally permitted, but you should respect each site's terms of service, robots.txt, copyright and applicable privacy laws. Don't republish copyrighted content without permission.

**My target site doesn't seem to work — what now?**
A small number of WordPress sites are configured to hide their posts from automated visitors entirely. If your target site is one of them, please open an issue on the **Issues** tab and we'll take a look — or reach out for a custom solution.

**Can I get help or request features?**
Yes. Use the **Issues** tab on this Actor's page to report bugs, ask questions or request new features. We read every report.

# Actor input Schema

## `siteUrl` (type: `string`):

The homepage URL of the WordPress site you want to scrape (e.g. https://wordpress.org/news). Just the root URL — no extra path needed.

## `maxPosts` (type: `integer`):

Maximum number of posts to scrape. The Actor keeps fetching until this limit is reached or there are no more posts. Use 0 for unlimited.

## `perPage` (type: `integer`):

How many posts to fetch per batch. Higher values run faster. Maximum 100.

## `includeContent` (type: `boolean`):

If enabled, the rendered HTML content of each post is included in the output. Disable to keep dataset items smaller.

## `minDelay` (type: `integer`):

Lower bound of the random delay applied between requests, to be polite to the target site.

## `maxDelay` (type: `integer`):

Upper bound of the random delay between requests.

## `proxyConfiguration` (type: `object`):

Optional proxy settings. Apify Proxy is recommended when scraping from cloud IPs that the target site may block.

## Actor input object example

```json
{
  "siteUrl": "https://wordpress.org/news",
  "maxPosts": 100,
  "perPage": 50,
  "includeContent": true,
  "minDelay": 1,
  "maxDelay": 3,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

All scraped posts in JSON/CSV/Excel/HTML/XML.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "siteUrl": "https://wordpress.org/news"
};

// Run the Actor and wait for it to finish
const run = await client.actor("hgservices/wordpress-post-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "siteUrl": "https://wordpress.org/news" }

# Run the Actor and wait for it to finish
run = client.actor("hgservices/wordpress-post-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "siteUrl": "https://wordpress.org/news"
}' |
apify call hgservices/wordpress-post-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=hgservices/wordpress-post-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "WordPress Post Scraper",
        "description": "Extract every blog post from any WordPress site — title, content, date, author, image, categories and tags.",
        "version": "0.1",
        "x-build-id": "SChA9f5z0qxDzpX1p"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/hgservices~wordpress-post-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-hgservices-wordpress-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/hgservices~wordpress-post-scraper/runs": {
            "post": {
                "operationId": "runs-sync-hgservices-wordpress-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/hgservices~wordpress-post-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-hgservices-wordpress-post-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "siteUrl"
                ],
                "properties": {
                    "siteUrl": {
                        "title": "WordPress site URL",
                        "pattern": "^https?://.+",
                        "type": "string",
                        "description": "The homepage URL of the WordPress site you want to scrape (e.g. https://wordpress.org/news). Just the root URL — no extra path needed.",
                        "default": "https://wordpress.org/news"
                    },
                    "maxPosts": {
                        "title": "Maximum posts",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of posts to scrape. The Actor keeps fetching until this limit is reached or there are no more posts. Use 0 for unlimited.",
                        "default": 100
                    },
                    "perPage": {
                        "title": "Posts per page",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "How many posts to fetch per batch. Higher values run faster. Maximum 100.",
                        "default": 50
                    },
                    "includeContent": {
                        "title": "Include full post content",
                        "type": "boolean",
                        "description": "If enabled, the rendered HTML content of each post is included in the output. Disable to keep dataset items smaller.",
                        "default": true
                    },
                    "minDelay": {
                        "title": "Min delay between requests (seconds)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Lower bound of the random delay applied between requests, to be polite to the target site.",
                        "default": 1
                    },
                    "maxDelay": {
                        "title": "Max delay between requests (seconds)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Upper bound of the random delay between requests.",
                        "default": 3
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional proxy settings. Apify Proxy is recommended when scraping from cloud IPs that the target site may block.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
