Blog Scraper
Pricing
from $33.00 / 1,000 standard-fetches
Blog Scraper
Company Blog Scraper, Blog Post Scraper, Corporate Blog Crawler, Automatic Blog Discovery, Blog Content Extractor, Article Metadata Scraper, Multi-Domain Blog Scraper, Competitor Blog Analysis, Content Marketing Scraper, Blog Post Metadata Extraction, Company Announcements Scraper.
Pricing
from $33.00 / 1,000 standard-fetches
Rating
0.0
(0)
Developer

Wyald
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
A robust Apify Actor designed to scrape blog posts from company websites. Given a list of company domains and a maximum number of posts to fetch, this scraper automatically discovers blog sections, extracts blog posts, and collects comprehensive content and metadata.
Targeted Keywords
- Primary: Blog Scraper, Content Extraction, Company Blog Crawler, Article Scraper
- Secondary: Blog Post Metadata, Content Marketing Analysis, Blog Content Aggregation, Corporate Blog Mining
Features
✅ Automatic Blog Discovery: Intelligently finds blog sections on company websites
✅ Smart Content Extraction: Extracts comprehensive blog post data including:
* Title
* Author
* Publication date
* Full article content
* Excerpt/summary
* Tags
* Category
* URL
✅ Configurable Limits: Set maximum number of posts per domain (up to 50)
✅ Multiple Domain Support: Scrape from multiple company websites in a single run
✅ Structured Output: Returns clean JSON data with all metadata
✅ Fast & Lightweight: Uses crawlee with BeautifulSoup for efficient HTTP-based scraping (no headless browser overhead)
Input
| Field | Type | Description | Required | Default |
|---|---|---|---|---|
company_urls | Array | List of company domain URLs or homepage URLs to scrape (e.g., ["https://stripe.com", "shopify.com"]). | Yes | - |
max_blogposts_to_fetch | Number | Maximum number of blog posts to fetch per domain (1-50) | No | 10 |
max_concurrency | Number | Number of concurrent requests | No | 2 |
Input Example
{"company_urls": ["https://www.stripe.com","https://shopify.com","https://ai-bees.io"],"max_blogposts_to_fetch": 10,"max_concurrency": 2}
Output Example
{"url": "https://www.stripe.com/blog/example-post","domain": "www.stripe.com","post_title": "How we scaled our payment infrastructure","author": "Jane Doe","published_date": "2024-01-15","content": "Full article content here...","excerpt": "Learn how we scaled our payment infrastructure to handle millions of transactions...","tags": ["engineering", "infrastructure", "scaling"],"category": "Engineering","scraped_at": "2024-01-20T10:30:00.000Z"}
How It Works
- Domain Analysis: The scraper starts by visiting each provided company domain
- Blog Detection: It automatically searches for blog sections using common patterns (/blog, /news, /articles, etc.)
- Post Discovery: Once in the blog section, it identifies individual blog post URLs
- Content Extraction: For each post, it extracts:
- Structured metadata (title, author, date)
- Full article content
- Additional metadata (tags, categories)
- Limit Enforcement: Respects the
number_of_blog_posts_to_fetchlimit per domain
Usage Tips
- URL Format: You can provide URLs with or without
https://- the scraper will normalize them - Rate Limiting: The scraper includes automatic delays to be respectful to target websites
- Post Limits: Maximum 50 posts per domain to prevent excessive scraping
- Concurrency: Adjust
max_concurrencybased on target website capacity (default: 2)
Use Cases
- Content Marketing Analysis: Analyze competitor blog strategies
- Content Aggregation: Collect blog content for research or analysis
- Market Intelligence: Monitor company announcements and thought leadership
- SEO Research: Study content patterns and topics from successful blogs
- Training Data: Collect blog content for ML/AI model training
Notes
- The scraper respects robots.txt and includes reasonable delays between requests
- Blog structure varies by website - extraction quality depends on site structure
- Some blogs may require authentication or have anti-scraping measures
- Always ensure you have permission to scrape the target websites