Cnn Top Headlines
Pricing
$29.00/month + usage
Cnn Top Headlines
DeprecatedApify Actor that scrapes top headlines from CNN's homepage and article pages.
5.0 (1)
Pricing
$29.00/month + usage
1
6
2
Last modified
12 days ago
CNN Top Headlines Scraper Actor
This Apify actor scrapes the latest top news headlines from CNN or CNN International, with optional extraction of full article details.
Features
- Extracts real news headlines from the CNN homepage or section pages
- Optionally follows links to extract full article content, author, and publish date
- Outputs clean, structured data for further processing or analysis
Usage
1. Input Options
Configure your run using the following input fields (see .actor/input_schema.json for details):
| Field | Type | Description | Default |
|---|---|---|---|
startUrls | array | List of URLs to start scraping from (homepage or section pages) | ["https://www.cnn.com/"] |
maxHeadlines | integer | Maximum number of headlines to extract and visit | 20 |
includeArticleDetails | boolean | If true, scrape full article details for each headline | false |
Example input:
{"startUrls": [{ "url": "https://edition.cnn.com/" }],"maxHeadlines": 10,"includeArticleDetails": true}
2. Output Format
Each result in the dataset will look like:
Headline only:
{"title": "Superman’ smashes box office expectations, soaring towards $130 million opening","url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl","source": "CNN","scrapedAt": "2025-07-13T12:56:40.535Z"}
With article details (if enabled):
{"title": "Superman’ smashes box office expectations, soaring towards $130 million opening","content": "Full article text ...","author": "CNN Staff","publishedDate": "2025-07-13T10:00:00Z","url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl","source": "CNN","scrapedAt": "2025-07-13T12:56:40.535Z"}
How It Works
- The actor visits the provided start URL(s) and extracts up to
maxHeadlinesnews headlines. - If
includeArticleDetailsis true, it follows each headline link and scrapes the article's content, author, and publish date. - Results are saved to the default Apify dataset for download in JSON, CSV, or Excel formats.
Customization
- To target a specific section (e.g., World, Business), set the appropriate
startUrls. - Adjust
maxHeadlinesto control crawl depth. - Set
includeArticleDetailstotruefor full article scraping.
⚠️ Important Notes
- Respect CNN's Terms of Service - Use this Actor responsibly and in accordance with CNN's policies.
- Rate Limiting - Avoid making too many requests in a short period to prevent overloading CNN's servers.
- Proxy Usage - For large-scale scraping, consider using proxies to avoid IP blocking.
- Data Usage - Ensure you have permission to use scraped data for your intended purpose.
- Public Content Only - This Actor can only scrape publicly accessible CNN news articles.
⚖️ Legal Disclaimer
This project is intended for educational and research purposes only. When using this Actor, please comply with CNN's Terms of Service and relevant robots.txt policies. Use this tool responsibly and avoid aggressive scraping that could negatively impact CNN's website infrastructure.
🔗 Related Actors
- Booking.com Hotel Scraper: Scrape hotel data, prices, ratings, and more from Booking.com with advanced anti-detection and flexible extraction limits.
- Product Hunt Scraper: Extract product listings, launch details, votes, and comments from Product Hunt for market research and trend analysis.
