Google News Scraper - Headlines and Sources
Pricing
from $10.00 / 1,000 results
Go to Apify Store
Google News Scraper - Headlines and Sources
Scrape Google News search results with article titles, sources, dates, and snippets.
Overview
The Google News Scraper is a versatile tool for collecting news articles, headlines, and metadata from Google News search results. It allows you to search for any topic and retrieve structured data including article titles, source publications, publication times, article URLs, and content snippets. This actor is perfect for media monitoring, trend analysis, competitive intelligence, and content research.
Features
- Multi-Query Support: Search for multiple topics in a single run to efficiently gather news from different areas of interest.
- Time Range Filtering: Filter results by time period including past 24 hours, past week, past month, or past year to focus on the most relevant news.
- Geo-Targeting: Specify country and language codes to get localized news results from specific regions.
- Article Metadata: Captures headline, source name, publication time, article URL, content snippet, and thumbnail image URL for each result.
- Pagination: Automatically paginates through Google News results to collect up to your specified maximum number of articles per query.
- Position Tracking: Records the rank position of each article in the search results for SERP analysis.
How It Works
- The actor takes your search queries and constructs Google News search URLs with the appropriate parameters for language, country, and time range.
- It fetches the search results page and parses the HTML using Cheerio to extract article information from the news result cards.
- For each article found, it captures the headline, source, publication time, URL, snippet text, and thumbnail image.
- The actor automatically paginates through multiple pages of results until it reaches your maxResults limit or runs out of results.
- A random delay between page requests helps maintain stable access to search results.
Input Configuration
- queries (required): An array of search terms to look up on Google News. Each query runs as a separate search.
- maxResults: Maximum number of articles to collect per query. Defaults to 30, maximum 100.
- timeRange: Filter by time period. Options are empty string (any time), "day" (past 24 hours), "week" (past week), "month" (past month), or "year" (past year).
- country: Two-letter country code for localized results (e.g., "us", "uk", "de"). Defaults to "us".
- language: Two-letter language code for results language (e.g., "en", "de", "fr"). Defaults to "en".
- proxyConfiguration: Optional proxy settings to avoid rate limiting on large-scale scraping runs.
Output Data
Each result in the dataset includes the following fields:
| Field | Description |
|---|---|
| searchQuery | The original search query |
| position | Rank position in search results |
| title | Article headline |
| url | Direct link to the article |
| source | Name of the publishing outlet |
| publishedTime | When the article was published |
| snippet | Brief excerpt of article content |
| imageUrl | Thumbnail image URL |
| scrapedAt | Timestamp of data collection |
Use Cases
- Monitor news coverage for your brand, competitors, or industry keywords
- Track trending topics and breaking news across different regions
- Research news sources and publication patterns for media analysis
- Build news aggregation datasets for machine learning and NLP projects
- Conduct competitive intelligence by monitoring competitor mentions in the press
- Generate content ideas by analyzing trending news in your niche
Tips for Best Results
- Use specific search queries for more targeted results. Broad terms may return less relevant articles.
- Combine time range filtering with specific queries to focus on recent developments.
- Set appropriate country and language codes to get results most relevant to your target audience.
- Use proxy configuration when running multiple queries or large result sets to maintain reliable access.
- Results are saved progressively, so partial data is available even if the run is interrupted.
