Google News Scraper - Headlines and Sources avatar

Google News Scraper - Headlines and Sources

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Google News Scraper - Headlines and Sources

Google News Scraper - Headlines and Sources

Scrape Google News search results with article titles, sources, dates, and snippets.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Donny

Donny

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 hours ago

Last modified

Categories

Share

Overview

The Google News Scraper is a versatile tool for collecting news articles, headlines, and metadata from Google News search results. It allows you to search for any topic and retrieve structured data including article titles, source publications, publication times, article URLs, and content snippets. This actor is perfect for media monitoring, trend analysis, competitive intelligence, and content research.

Features

  • Multi-Query Support: Search for multiple topics in a single run to efficiently gather news from different areas of interest.
  • Time Range Filtering: Filter results by time period including past 24 hours, past week, past month, or past year to focus on the most relevant news.
  • Geo-Targeting: Specify country and language codes to get localized news results from specific regions.
  • Article Metadata: Captures headline, source name, publication time, article URL, content snippet, and thumbnail image URL for each result.
  • Pagination: Automatically paginates through Google News results to collect up to your specified maximum number of articles per query.
  • Position Tracking: Records the rank position of each article in the search results for SERP analysis.

How It Works

  1. The actor takes your search queries and constructs Google News search URLs with the appropriate parameters for language, country, and time range.
  2. It fetches the search results page and parses the HTML using Cheerio to extract article information from the news result cards.
  3. For each article found, it captures the headline, source, publication time, URL, snippet text, and thumbnail image.
  4. The actor automatically paginates through multiple pages of results until it reaches your maxResults limit or runs out of results.
  5. A random delay between page requests helps maintain stable access to search results.

Input Configuration

  • queries (required): An array of search terms to look up on Google News. Each query runs as a separate search.
  • maxResults: Maximum number of articles to collect per query. Defaults to 30, maximum 100.
  • timeRange: Filter by time period. Options are empty string (any time), "day" (past 24 hours), "week" (past week), "month" (past month), or "year" (past year).
  • country: Two-letter country code for localized results (e.g., "us", "uk", "de"). Defaults to "us".
  • language: Two-letter language code for results language (e.g., "en", "de", "fr"). Defaults to "en".
  • proxyConfiguration: Optional proxy settings to avoid rate limiting on large-scale scraping runs.

Output Data

Each result in the dataset includes the following fields:

FieldDescription
searchQueryThe original search query
positionRank position in search results
titleArticle headline
urlDirect link to the article
sourceName of the publishing outlet
publishedTimeWhen the article was published
snippetBrief excerpt of article content
imageUrlThumbnail image URL
scrapedAtTimestamp of data collection

Use Cases

  • Monitor news coverage for your brand, competitors, or industry keywords
  • Track trending topics and breaking news across different regions
  • Research news sources and publication patterns for media analysis
  • Build news aggregation datasets for machine learning and NLP projects
  • Conduct competitive intelligence by monitoring competitor mentions in the press
  • Generate content ideas by analyzing trending news in your niche

Tips for Best Results

  • Use specific search queries for more targeted results. Broad terms may return less relevant articles.
  • Combine time range filtering with specific queries to focus on recent developments.
  • Set appropriate country and language codes to get results most relevant to your target audience.
  • Use proxy configuration when running multiple queries or large result sets to maintain reliable access.
  • Results are saved progressively, so partial data is available even if the run is interrupted.