Cnn Top Headlines avatar
Cnn Top Headlines

Pricing

$150.00 / 1,000 results

Go to Store
Cnn Top Headlines

Cnn Top Headlines

Developed by

Runtime

Runtime

Maintained by Community

Apify Actor that scrapes top headlines from CNN's homepage and article pages.

5.0 (1)

Pricing

$150.00 / 1,000 results

1

Total users

1

Monthly users

1

Runs succeeded

>99%

Last modified

4 days ago

CNN Top Headlines Scraper Actor

This Apify actor scrapes the latest top news headlines from CNN or CNN International, with optional extraction of full article details.

Features

  • Extracts real news headlines from the CNN homepage or section pages
  • Optionally follows links to extract full article content, author, and publish date
  • Outputs clean, structured data for further processing or analysis

Usage

1. Input Options

Configure your run using the following input fields (see .actor/input_schema.json for details):

FieldTypeDescriptionDefault
startUrlsarrayList of URLs to start scraping from (homepage or section pages)["https://www.cnn.com/"]
maxHeadlinesintegerMaximum number of headlines to extract and visit20
includeArticleDetailsbooleanIf true, scrape full article details for each headlinefalse

Example input:

{
"startUrls": [
{ "url": "https://edition.cnn.com/" }
],
"maxHeadlines": 10,
"includeArticleDetails": true
}

2. Output Format

Each result in the dataset will look like:

Headline only:

{
"title": "Superman’ smashes box office expectations, soaring towards $130 million opening",
"url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl",
"source": "CNN",
"scrapedAt": "2025-07-13T12:56:40.535Z"
}

With article details (if enabled):

{
"title": "Superman’ smashes box office expectations, soaring towards $130 million opening",
"content": "Full article text ...",
"author": "CNN Staff",
"publishedDate": "2025-07-13T10:00:00Z",
"url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl",
"source": "CNN",
"scrapedAt": "2025-07-13T12:56:40.535Z"
}

How It Works

  • The actor visits the provided start URL(s) and extracts up to maxHeadlines news headlines.
  • If includeArticleDetails is true, it follows each headline link and scrapes the article's content, author, and publish date.
  • Results are saved to the default Apify dataset for download in JSON, CSV, or Excel formats.

Customization

  • To target a specific section (e.g., World, Business), set the appropriate startUrls.
  • Adjust maxHeadlines to control crawl depth.
  • Set includeArticleDetails to true for full article scraping.

⚠️ Important Notes

  1. Respect CNN's Terms of Service - Use this Actor responsibly and in accordance with CNN's policies.
  2. Rate Limiting - Avoid making too many requests in a short period to prevent overloading CNN's servers.
  3. Proxy Usage - For large-scale scraping, consider using proxies to avoid IP blocking.
  4. Data Usage - Ensure you have permission to use scraped data for your intended purpose.
  5. Public Content Only - This Actor can only scrape publicly accessible CNN news articles.

This project is intended for educational and research purposes only. When using this Actor, please comply with CNN's Terms of Service and relevant robots.txt policies. Use this tool responsibly and avoid aggressive scraping that could negatively impact CNN's website infrastructure.

  • Booking.com Hotel Scraper: Scrape hotel data, prices, ratings, and more from Booking.com with advanced anti-detection and flexible extraction limits.