
✨ WordPress Content Extractor
Pricing
$29.00/month + usage
Go to Apify Store

✨ WordPress Content Extractor
🔍Easily scrape and export posts, pages, metadata, images, and comments from any WordPress site. ✨ WordPress content to JSON, CSV, or TXT — instantly.
0.0 (0)
Pricing
$29.00/month + usage
0
1
1
Last modified
4 hours ago
A powerful Apify Actor designed to extract comprehensive content from WordPress websites. This actor automatically discovers and extracts posts, pages, metadata, media, and other WordPress-specific content using intelligent parsing and WordPress REST API integration.
🚀 Features
Comprehensive Content Extraction
- Blog Posts - Extract all blog posts with full content, titles, and metadata
- Static Pages - Extract WordPress pages and custom post types
- Media Assets - Extract images, videos, and other media with alt text
- SEO Metadata - Extract meta descriptions, Open Graph tags, and Twitter cards
- Comments - Optional extraction of user comments and discussions
- Taxonomies - Extract categories, tags, and custom taxonomies
- Author Information - Extract post/page author details
- Publication Dates - Extract publication and modification timestamps
Smart Discovery
- Automatic URL Discovery - Finds posts and pages through navigation menus
- WordPress REST API Integration - Leverages
/wp-json/wp/v2/
endpoints when available - Pagination Support - Automatically follows pagination links
- Category & Tag Pages - Discovers content through WordPress taxonomies
Advanced Configuration
- Selective Extraction - Choose what content types to extract
- Page Limits - Set maximum number of pages to process
- SSL Support - Handles sites with certificate issues
- Custom Headers - Uses realistic browser headers for better compatibility
📊 Extracted Data Structure
Each extracted page/post includes:
{"url": "https://example.com/post-title","title": "Post Title","content": "Full HTML content or text","excerpt": "Post excerpt/summary","metadata": {"description": "Meta description","keywords": "Meta keywords","ogTitle": "Open Graph title","ogDescription": "Open Graph description","ogImage": "Open Graph image URL","canonical": "Canonical URL"},"media": [{"src": "image-url.jpg","alt": "Image alt text","type": "image"}],"comments": [{"author": "Commenter Name","content": "Comment text","date": "Comment date"}],"publishedDate": "2024-01-01T00:00:00Z","author": "Post Author","categories": ["Category 1", "Category 2"],"tags": ["tag1", "tag2"],"type": "post"}
⚙️ Input Configuration
Parameter | Type | Default | Description |
---|---|---|---|
url | String | Required | WordPress website URL to extract from |
extractPosts | Boolean | true | Whether to extract blog posts |
extractPages | Boolean | true | Whether to extract static pages |
extractMedia | Boolean | true | Whether to extract media URLs |
extractMetadata | Boolean | true | Whether to extract SEO metadata |
maxPages | Integer | 0 | Maximum pages to extract (0 = no limit) |
includeComments | Boolean | false | Whether to extract comments |
🛠️ Technical Details
Built With
- Apify SDK - Core actor framework
- Axios - HTTP client with SSL support
- Cheerio - Fast HTML parsing and manipulation
- Node.js - Runtime environment
WordPress Compatibility
- All WordPress versions - Works with any WordPress site
- Custom themes - Adapts to different theme structures
- Gutenberg blocks - Supports modern WordPress block editor
- Custom post types - Extracts custom content types
- Multisite networks - Works with WordPress multisite installations
Performance Features
- Concurrent processing - Efficient parallel content extraction
- Respectful crawling - Built-in delays to avoid overwhelming servers
- Error handling - Robust error recovery and logging
- Memory efficient - Optimized for large-scale extraction
🚀 Getting Started
Quick Start
- Deploy the Actor - Build and deploy on Apify Platform
- Configure Input - Set your WordPress website URL
- Run Extraction - Start the actor and monitor progress
- Download Results - Get extracted data in JSON, CSV, or other formats
Example Usage
// Input configuration{"url": "https://your-wordpress-site.com","extractPosts": true,"extractPages": true,"extractMedia": true,"extractMetadata": true,"maxPages": 50,"includeComments": false}
📈 Use Cases
Content Migration
- Site Migration - Extract content for moving to new platforms
- Backup Creation - Create comprehensive content backups
- Platform Migration - Move from WordPress to other CMS platforms
Content Analysis
- SEO Audit - Analyze meta tags and content structure
- Content Inventory - Catalog all posts, pages, and media
- Performance Analysis - Analyze content patterns and structure
Data Integration
- API Development - Create APIs from WordPress content
- Analytics Integration - Feed content data to analytics platforms
- Content Syndication - Distribute content to multiple platforms
On this page
Share Actor: