✨ WordPress Content Extractor avatar
✨ WordPress Content Extractor

Pricing

$29.00/month + usage

Go to Apify Store
✨ WordPress Content Extractor

✨ WordPress Content Extractor

Developed by

ramman

ramman

Maintained by Community

🔍Easily scrape and export posts, pages, metadata, images, and comments from any WordPress site. ✨ WordPress content to JSON, CSV, or TXT — instantly.

0.0 (0)

Pricing

$29.00/month + usage

0

1

1

Last modified

4 hours ago

A powerful Apify Actor designed to extract comprehensive content from WordPress websites. This actor automatically discovers and extracts posts, pages, metadata, media, and other WordPress-specific content using intelligent parsing and WordPress REST API integration.

🚀 Features

Comprehensive Content Extraction

  • Blog Posts - Extract all blog posts with full content, titles, and metadata
  • Static Pages - Extract WordPress pages and custom post types
  • Media Assets - Extract images, videos, and other media with alt text
  • SEO Metadata - Extract meta descriptions, Open Graph tags, and Twitter cards
  • Comments - Optional extraction of user comments and discussions
  • Taxonomies - Extract categories, tags, and custom taxonomies
  • Author Information - Extract post/page author details
  • Publication Dates - Extract publication and modification timestamps

Smart Discovery

  • Automatic URL Discovery - Finds posts and pages through navigation menus
  • WordPress REST API Integration - Leverages /wp-json/wp/v2/ endpoints when available
  • Pagination Support - Automatically follows pagination links
  • Category & Tag Pages - Discovers content through WordPress taxonomies

Advanced Configuration

  • Selective Extraction - Choose what content types to extract
  • Page Limits - Set maximum number of pages to process
  • SSL Support - Handles sites with certificate issues
  • Custom Headers - Uses realistic browser headers for better compatibility

📊 Extracted Data Structure

Each extracted page/post includes:

{
"url": "https://example.com/post-title",
"title": "Post Title",
"content": "Full HTML content or text",
"excerpt": "Post excerpt/summary",
"metadata": {
"description": "Meta description",
"keywords": "Meta keywords",
"ogTitle": "Open Graph title",
"ogDescription": "Open Graph description",
"ogImage": "Open Graph image URL",
"canonical": "Canonical URL"
},
"media": [
{
"src": "image-url.jpg",
"alt": "Image alt text",
"type": "image"
}
],
"comments": [
{
"author": "Commenter Name",
"content": "Comment text",
"date": "Comment date"
}
],
"publishedDate": "2024-01-01T00:00:00Z",
"author": "Post Author",
"categories": ["Category 1", "Category 2"],
"tags": ["tag1", "tag2"],
"type": "post"
}

⚙️ Input Configuration

ParameterTypeDefaultDescription
urlStringRequiredWordPress website URL to extract from
extractPostsBooleantrueWhether to extract blog posts
extractPagesBooleantrueWhether to extract static pages
extractMediaBooleantrueWhether to extract media URLs
extractMetadataBooleantrueWhether to extract SEO metadata
maxPagesInteger0Maximum pages to extract (0 = no limit)
includeCommentsBooleanfalseWhether to extract comments

🛠️ Technical Details

Built With

  • Apify SDK - Core actor framework
  • Axios - HTTP client with SSL support
  • Cheerio - Fast HTML parsing and manipulation
  • Node.js - Runtime environment

WordPress Compatibility

  • All WordPress versions - Works with any WordPress site
  • Custom themes - Adapts to different theme structures
  • Gutenberg blocks - Supports modern WordPress block editor
  • Custom post types - Extracts custom content types
  • Multisite networks - Works with WordPress multisite installations

Performance Features

  • Concurrent processing - Efficient parallel content extraction
  • Respectful crawling - Built-in delays to avoid overwhelming servers
  • Error handling - Robust error recovery and logging
  • Memory efficient - Optimized for large-scale extraction

🚀 Getting Started

Quick Start

  1. Deploy the Actor - Build and deploy on Apify Platform
  2. Configure Input - Set your WordPress website URL
  3. Run Extraction - Start the actor and monitor progress
  4. Download Results - Get extracted data in JSON, CSV, or other formats

Example Usage

// Input configuration
{
"url": "https://your-wordpress-site.com",
"extractPosts": true,
"extractPages": true,
"extractMedia": true,
"extractMetadata": true,
"maxPages": 50,
"includeComments": false
}

📈 Use Cases

Content Migration

  • Site Migration - Extract content for moving to new platforms
  • Backup Creation - Create comprehensive content backups
  • Platform Migration - Move from WordPress to other CMS platforms

Content Analysis

  • SEO Audit - Analyze meta tags and content structure
  • Content Inventory - Catalog all posts, pages, and media
  • Performance Analysis - Analyze content patterns and structure

Data Integration

  • API Development - Create APIs from WordPress content
  • Analytics Integration - Feed content data to analytics platforms
  • Content Syndication - Distribute content to multiple platforms