Google News Scraper avatar

Google News Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Google News Scraper

Google News Scraper

📰 Google News Scraper collects real-time headlines, publishers, snippets, dates & links from Google News. 🔎 Filter by keywords, topics, country & language. 📊 Export JSON/CSV, deduplicate & schedule crawls. 🚀 Perfect for media monitoring, trend tracking & research.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScraperForge

ScraperForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Google News Scraper

Google News Scraper is a fast, reliable Google News scraping tool that collects headlines, publishers, snippets, dates, links, and images from Google News RSS — ideal for marketers, developers, data analysts, and researchers who need to scrape Google News at scale. It targets the Google News SERP feed, handles regions and languages, and delivers clean, structured results for media monitoring, trend tracking, and research. With async fetching, proxy fallback, and smart de-duplication, this real-time Google News scraper enables consistent Google News data extraction without manual effort.

What data / output can you get?

Below are the exact fields this Google News crawler stores for each article it collects and pushes to the Apify dataset.

Data typeDescriptionExample value
positionResult index in the current run (1-based)1
titleArticle headlineTesla announces new factory plans in Mexico
linkDirect article URL (resolved from RSS redirect where possible)https://example.com/tesla-factory-plans
domainDomain derived from source name or article URLexample.com
sourcePublisher/source name parsed from RSS entryBloomberg
dateHuman-friendly relative time computed from pubDate2 hours ago
date_utcISO 8601 UTC timestamp computed from pubDate2026-03-15T10:30:00+00:00
snippetCleaned snippet extracted from the RSS descriptionTesla is planning a new manufacturing facility in Mexico...
thumbnailBase64 data URL for a fetched article image (Open Graph/Twitter Card/inline)data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ...
block_positionSame as position; maintained for compatibility1

Notes:

  • The actor de-duplicates by GUID during parsing to prevent duplicate items.
  • Thumbnails are retrieved from the article page when possible and encoded as base64; when the image cannot be determined or fetched, this field may be empty.
  • Snippets are derived by cleaning HTML from the RSS description.

Key features

  • 🔁 Bold proxy fallback workflow
    Starts without a proxy and automatically escalates to datacenter and then residential proxies on blocks or failures (with exponential backoff and retries). This boosts reliability for Google News scraping without API access.

  • 🌍 Region & language controls
    Configure Google country (gl), UI language (hl), language-limited results (lr), and country-limited results (cr) to tailor your Google News data extraction by market.

  • 🕒 Flexible time filtering
    Filter by last hour, day, week, month, year, or a custom date range using time_period with time_period_min/time_period_max in MM/DD/YYYY format.

  • 🧹 Clean snippets & readable dates
    HTML is stripped from RSS descriptions to produce clean snippets, while pubDate is converted to both a relative “time ago” string and ISO 8601 UTC timestamp.

  • 🖼️ Smart thumbnail capture
    Fetches images via Open Graph or Twitter Card tags, with fallbacks to article content images. Valid images are encoded as base64 data URLs for portable use.

  • 🚦 De-duplication & multi-strategy harvesting
    Prevents duplicates via GUID tracking and augments collection by trying multiple time-range strategies (e.g., day/week/month) when no specific time period is set.

  • ⚙️ Async performance & stability
    Built on Python asyncio + aiohttp for speed, with per-request timeouts, rate limiting between requests, and up to 3 retries per proxy level to maximize success rates.

  • 📦 Real-time dataset writes
    Items are saved incrementally during the run, so you can monitor results as they arrive and consume them from the run’s dataset stream.

How to use Google News Scraper - step by step

  1. Create or log in to your Apify account
    Access the actor from your Apify dashboard.

  2. Open Google News Scraper
    Navigate to the “google-news-scraper” actor.

  3. Enter your input parameters
    At minimum, provide query and maxItems. Optionally add gl, hl, lr, cr, time_period (and custom dates), nfpr, filter, and proxyConfiguration.

  4. Tune filters and locale

    • Use gl (Google Country) and hl (UI Language) to localize results.
    • Use lr and/or cr to limit results by language or country.
    • Use time_period to constrain recency, including a custom date range.
  5. Control result volume & behavior

    • Set maxItems (100–5000).
    • Toggle nfpr (exclude autocorrect) and filter (Similar/Omitted Results).
  6. Start the run
    The actor fetches Google News RSS data, applies retry logic and proxy fallback as needed, and writes items to the dataset in real time.

  7. Review and download your results
    Open the run’s Dataset to view, filter, and export items as needed for your workflow.

Pro tip: For precise date windows, set time_period to custom and provide time_period_min/time_period_max in MM/DD/YYYY.

Use cases

Use case nameDescription
Media monitoring & alertsTrack breaking stories and publishers for your topics and brands with a real-time Google News scraper that saves structured articles continuously.
SEO & content planningIdentify trending topics and headlines to inform content calendars using consistent Google News headlines scraper output.
Competitive intelligenceMonitor competitors’ press coverage and announcements by filtering results with country/language parameters.
Market & financial trackingFollow sector-specific news (e.g., “earnings”, “acquisition”) with time-based filters for last day/week.
Academic & policy researchBuild structured corpora of articles for analysis using language-restricted results (lr) and region constraints (gl/cr).
Data pipelines & dashboardsUse the dataset output as a Google News API alternative to power dashboards and analytics without scraping browsers.

Why choose Google News Scraper?

This production-ready Google News scraping tool combines precision, automation, and reliability.

  • ✅ Accurate structured output with consistent fields (title, link, domain, source, date, snippet, thumbnail, positions).
  • 🌐 Multilingual and multi-region support via gl, hl, lr, and cr parameters.
  • 📈 Scales reliably with async requests, rate limiting, and up to 3 automatic retries per proxy level.
  • 🧑‍💻 Developer-friendly dataset output ready for integrations and downstream processing.
  • 🔐 Safe-by-design proxy fallback (none → datacenter → residential) to reduce blocks and keep runs stable.
  • 🕒 Real-time saves to the dataset so long-running queries produce usable data immediately.
  • 🧰 More robust than browser extensions or ad‑hoc scripts — built with aiohttp, BeautifulSoup, and clear retry logic.

Bottom line: if you need a dependable Google News scraping without API approach, this actor delivers consistent, clean results at scale.

Yes — when done responsibly. The actor processes publicly accessible Google News RSS content and does not access private or authenticated data.

Guidelines for compliant use:

  • Respect platform terms and robots.txt directives.
  • Avoid abusive behavior (high request rates, excessive retries).
  • Use data for lawful purposes and follow applicable regulations (e.g., fair use).
  • Attribute original publishers when required by your use case.
  • Consult your legal team for edge cases and jurisdiction-specific requirements.

Input parameters & output format

Example JSON input

{
"query": "Tesla",
"maxItems": 200,
"gl": "United States",
"hl": "English",
"lr": "English",
"cr": "United States",
"time_period": "last_week",
"time_period_min": "03/01/2026",
"time_period_max": "03/31/2026",
"nfpr": 1,
"filter": 1,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Parameters

FieldTypeDescriptionDefaultRequired
maxItemsintegerMaximum number of search results to retrieve (100–5000 enforced)100Yes
querystringThe search term to useElon MuskYes
glstringThe Google country to use for the queryNo
hlstringThe Google UI language to return resultsNo
lrstringLimit the results to a specific languageNo
crstringLimit the results to a specific countryNo
time_periodstringTime period for results: last_hour, last_day, last_week, last_month, last_year, customNo
time_period_minstringMinimum date for custom time period (MM/DD/YYYY)No
time_period_maxstringMaximum date for custom time period (MM/DD/YYYY)No
nfprintegerExclude results from auto-corrected queries (0 or 1)0No
filterintegerEnable/disable Similar Results and Omitted Results filters (0 or 1)1No
proxyConfigurationobjectConfigure proxy settings. The actor will start with no proxy, then fallback to datacenter, then residential proxies if needed.{"useApifyProxy": false}No

Notes:

  • If maxItems is set below 100, the actor automatically raises it to 100; above 5000, it caps at 5000.
  • For time_period="custom", both time_period_min and time_period_max must be provided in MM/DD/YYYY format.

Example JSON output

{
"position": 1,
"title": "Tesla announces new factory plans in Mexico",
"link": "https://example.com/tesla-factory-plans",
"domain": "example.com",
"source": "Bloomberg",
"date": "2 hours ago",
"date_utc": "2026-03-15T10:30:00+00:00",
"snippet": "Tesla is planning a new manufacturing facility in Mexico...",
"thumbnail": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ...",
"block_position": 1
}

Field notes:

  • thumbnail may be empty if no suitable image is found or the image is not retrievable.
  • date and date_utc are derived from the RSS pubDate; if parsing fails, the actor uses fallbacks.

FAQ

Is there a free trial or free tier?

Yes. This actor includes a 120-minute trial window in its current pricing plan, so you can evaluate it before subscribing.

Does it support Google News scraping with Python?

Yes. The actor is implemented in Python using asyncio and aiohttp, and produces structured dataset items suitable for downstream Python workflows.

How many results can it collect per run?

You can request between 100 and 5000 items via maxItems. The actor enforces this range for stability and performance.

Can I filter by language and country?

Yes. Use hl (UI language), lr (language-limited results), gl (Google country), and cr (country-limited results) to localize your results.

Can I filter by time range?

Yes. Set time_period to last_hour, last_day, last_week, last_month, last_year, or custom. For custom, provide time_period_min and time_period_max in MM/DD/YYYY format.

How does proxy handling work?

The actor starts with no proxy, then automatically falls back to datacenter proxies, and finally residential proxies if blocks or errors occur. It also retries requests up to three times per proxy level with backoff.

Does it de-duplicate results?

Yes. The actor uses item GUIDs from the RSS feed to avoid saving duplicate articles during a run.

What images are returned?

The actor attempts to fetch an article thumbnail by checking Open Graph and Twitter Card tags and then scanning suitable in-page images. Valid images are returned as base64 data URLs in the thumbnail field.

Is this a Google News API alternative?

For many use cases, yes. It provides structured article data from Google News RSS that you can use in pipelines and dashboards without relying on a separate API.

What do nfpr and filter options do?

  • nfpr: Excludes results from auto-corrected queries when set to 1.
  • filter: Enables (1) or disables (0) Google’s Similar/Omitted Results filters.

Closing CTA / Final thoughts

Google News Scraper is built for accurate, scalable collection of structured Google News data. With locale controls, flexible time filters, async performance, and robust proxy fallback, it provides dependable results for marketers, developers, analysts, and researchers. Configure your query, set maxItems and filters, and start capturing real-time news signals with clean titles, links, snippets, timestamps, and thumbnails. If you’re building a Google News scraping pipeline or seeking a Google News API alternative, this actor gives you production-ready, structured output to power your apps and analysis.