
CNN Article Scraper
Pricing
Pay per usage

CNN Article Scraper
Extract CNN articles by category or search query with date filtering. Scrape news from politics, business, world, tech, sports, and more. Get structured data: title, author, publication date, full content. Perfect for media monitoring, research, and content analysis.
5.0 (2)
Pricing
Pay per usage
2
3
3
Last modified
a day ago
Extract articles from CNN by category or search query with precise date filtering. This Actor scrapes article metadata and full content from CNN's website, making it ideal for media monitoring, content research, and data analysis.
What does CNN Article Scraper do?
This Actor retrieves articles from CNN.com based on your specified criteria:
- Category-based scraping: Extract articles from specific CNN sections (politics, business, world news, etc.)
- Search-based scraping: Find articles matching specific keywords or topics
- Date filtering: Precisely control the publication time window
- Concurrent processing: Adjust scraping speed with configurable concurrency
- Structured output: Get clean, organized article data including title, author, publication date, full content, and URL
Use Cases
- Media Monitoring: Track CNN's coverage of specific topics or events over time and identify trends in news reporting
- Market Research: Analyze business and technology news for competitive intelligence, industry trends, and market insights
- Academic Research: Collect news articles for content analysis, sentiment studies, or media studies research projects
- Content Aggregation: Build news feeds or newsletters by automatically collecting relevant CNN articles within specific timeframes
- Competitive Analysis: Track how CNN covers your industry, competitors, or specific topics compared to other news sources
Output Format
Each scraped article is stored as a separate item in the dataset with the following structure:
{"title": "Article headline","author": "Reporter Name","publicationDate": "2025-01-15","content": "Full article text content...","url": "https://www.cnn.com/2025/01/15/politics/article-slug/index.html","scrapedAt": "2025-10-10T14:30:00.000Z"}
Output Fields
title
: Article headline as it appears on CNNauthor
: Article author(s) name or "Unknown" if not foundpublicationDate
: Publication date in YYYY-MM-DD formatcontent
: Full article text with paragraphs separated by double line breaksurl
: Direct link to the article on CNN.comscrapedAt
: ISO timestamp of when the article was scraped
Features
- ✅ Dual scraping modes: Category browsing or keyword search
- ✅ Precise date filtering: Only scrapes articles within your specified date range
- ✅ Early filtering optimization: Filters articles by date before scraping full content
- ✅ Automatic retry logic: Handles temporary network errors with built-in retry mechanism
- ✅ Concurrent processing: Adjustable parallelization for faster scraping
- ✅ Clean content extraction: Filters out ads, JavaScript code, and non-article content
- ✅ Structured data output: Consistent JSON format for easy integration
- ✅ Duplicate prevention: Automatically removes duplicate article URLs
- ✅ Pay-per-use pricing: Only pay for what you scrape
Performance & Limits
Speed Optimization
- Concurrency: Higher concurrency speeds up scraping but uses more resources
- Date filtering: Early date filtering reduces unnecessary requests
- Batch processing: Articles are processed in batches based on concurrency setting
Recommended Settings
- For quick tests:
maxArticles: 10
,concurrency: 1
- For moderate scraping:
maxArticles: 100
,concurrency: 5
- For large-scale scraping:
maxArticles: 0
(unlimited),concurrency: 10-15
Troubleshooting
No articles found
Problem: Actor completes but returns zero articles.
Solutions:
- Verify your date range includes recent articles (CNN archive may be limited)
- Check if the category URL structure has changed
- Try using
searchQuery
instead ofcategory
for more reliable results - Expand your date range to include more articles
Missing author or content
Problem: Some fields return "Unknown" or empty content.
Solutions:
- CNN's HTML structure varies by article type. Some articles (videos, opinion pieces) may have different layouts
- The Actor uses multiple selectors to extract data but cannot guarantee 100% success for all article types
- Consider filtering results by checking for non-empty fields in your post-processing
Scraping too slow
Problem: Actor takes too long to complete.
Solutions:
- Increase
concurrency
to 10-15 for faster parallel processing - Reduce
maxArticles
if you don't need all available articles - Narrow your date range to reduce the number of articles to process
Limitations
- The Actor scrapes publicly available CNN articles only
- Article structure may vary, affecting data extraction accuracy
- Very old articles may have different HTML structures
- CNN may update their website structure, requiring Actor maintenance
- Search API results are limited to what CNN makes available through their search service
Support
Need help or have questions about this Actor?
- Open an issue in the Actor's Issues tab
- Check the Apify documentation for general platform guidance
- Review this README for configuration and troubleshooting tips
Feedback
If you found this Actor helpful, please leave a review on the Actor page. Your feedback helps improve the Actor and helps other users discover it.
Pricing: This Actor uses pay-per-use pricing. You only pay for the compute resources consumed during scraping. See the Apify pricing page for current rates.