
Cnn Top Headlines
Pricing
$150.00 / 1,000 results

Cnn Top Headlines
Apify Actor that scrapes top headlines from CNN's homepage and article pages.
5.0 (1)
Pricing
$150.00 / 1,000 results
1
Total users
1
Monthly users
1
Runs succeeded
>99%
Last modified
4 days ago
CNN Top Headlines Scraper Actor
This Apify actor scrapes the latest top news headlines from CNN or CNN International, with optional extraction of full article details.
Features
- Extracts real news headlines from the CNN homepage or section pages
- Optionally follows links to extract full article content, author, and publish date
- Outputs clean, structured data for further processing or analysis
Usage
1. Input Options
Configure your run using the following input fields (see .actor/input_schema.json
for details):
Field | Type | Description | Default |
---|---|---|---|
startUrls | array | List of URLs to start scraping from (homepage or section pages) | ["https://www.cnn.com/"] |
maxHeadlines | integer | Maximum number of headlines to extract and visit | 20 |
includeArticleDetails | boolean | If true, scrape full article details for each headline | false |
Example input:
{"startUrls": [{ "url": "https://edition.cnn.com/" }],"maxHeadlines": 10,"includeArticleDetails": true}
2. Output Format
Each result in the dataset will look like:
Headline only:
{"title": "Superman’ smashes box office expectations, soaring towards $130 million opening","url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl","source": "CNN","scrapedAt": "2025-07-13T12:56:40.535Z"}
With article details (if enabled):
{"title": "Superman’ smashes box office expectations, soaring towards $130 million opening","content": "Full article text ...","author": "CNN Staff","publishedDate": "2025-07-13T10:00:00Z","url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl","source": "CNN","scrapedAt": "2025-07-13T12:56:40.535Z"}
How It Works
- The actor visits the provided start URL(s) and extracts up to
maxHeadlines
news headlines. - If
includeArticleDetails
is true, it follows each headline link and scrapes the article's content, author, and publish date. - Results are saved to the default Apify dataset for download in JSON, CSV, or Excel formats.
Customization
- To target a specific section (e.g., World, Business), set the appropriate
startUrls
. - Adjust
maxHeadlines
to control crawl depth. - Set
includeArticleDetails
totrue
for full article scraping.
⚠️ Important Notes
- Respect CNN's Terms of Service - Use this Actor responsibly and in accordance with CNN's policies.
- Rate Limiting - Avoid making too many requests in a short period to prevent overloading CNN's servers.
- Proxy Usage - For large-scale scraping, consider using proxies to avoid IP blocking.
- Data Usage - Ensure you have permission to use scraped data for your intended purpose.
- Public Content Only - This Actor can only scrape publicly accessible CNN news articles.
⚖️ Legal Disclaimer
This project is intended for educational and research purposes only. When using this Actor, please comply with CNN's Terms of Service and relevant robots.txt policies. Use this tool responsibly and avoid aggressive scraping that could negatively impact CNN's website infrastructure.
🔗 Related Actors
- Booking.com Hotel Scraper: Scrape hotel data, prices, ratings, and more from Booking.com with advanced anti-detection and flexible extraction limits.