Google News Scraper
Pricing
$19.99/month + usage
Google News Scraper
📰 Google News Scraper collects real-time headlines, publishers, snippets, dates & links from Google News. 🔎 Filter by keywords, topics, country & language. 📊 Export JSON/CSV, deduplicate & schedule crawls. 🚀 Perfect for media monitoring, trend tracking & research.
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScraperForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Google News Scraper
Google News Scraper is a fast, reliable Google News scraping tool that collects headlines, publishers, snippets, dates, links, and images from Google News RSS — ideal for marketers, developers, data analysts, and researchers who need to scrape Google News at scale. It targets the Google News SERP feed, handles regions and languages, and delivers clean, structured results for media monitoring, trend tracking, and research. With async fetching, proxy fallback, and smart de-duplication, this real-time Google News scraper enables consistent Google News data extraction without manual effort.
What data / output can you get?
Below are the exact fields this Google News crawler stores for each article it collects and pushes to the Apify dataset.
| Data type | Description | Example value |
|---|---|---|
| position | Result index in the current run (1-based) | 1 |
| title | Article headline | Tesla announces new factory plans in Mexico |
| link | Direct article URL (resolved from RSS redirect where possible) | https://example.com/tesla-factory-plans |
| domain | Domain derived from source name or article URL | example.com |
| source | Publisher/source name parsed from RSS entry | Bloomberg |
| date | Human-friendly relative time computed from pubDate | 2 hours ago |
| date_utc | ISO 8601 UTC timestamp computed from pubDate | 2026-03-15T10:30:00+00:00 |
| snippet | Cleaned snippet extracted from the RSS description | Tesla is planning a new manufacturing facility in Mexico... |
| thumbnail | Base64 data URL for a fetched article image (Open Graph/Twitter Card/inline) | data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ... |
| block_position | Same as position; maintained for compatibility | 1 |
Notes:
- The actor de-duplicates by GUID during parsing to prevent duplicate items.
- Thumbnails are retrieved from the article page when possible and encoded as base64; when the image cannot be determined or fetched, this field may be empty.
- Snippets are derived by cleaning HTML from the RSS description.
Key features
-
🔁 Bold proxy fallback workflow
Starts without a proxy and automatically escalates to datacenter and then residential proxies on blocks or failures (with exponential backoff and retries). This boosts reliability for Google News scraping without API access. -
🌍 Region & language controls
Configure Google country (gl), UI language (hl), language-limited results (lr), and country-limited results (cr) to tailor your Google News data extraction by market. -
🕒 Flexible time filtering
Filter by last hour, day, week, month, year, or a custom date range using time_period with time_period_min/time_period_max in MM/DD/YYYY format. -
🧹 Clean snippets & readable dates
HTML is stripped from RSS descriptions to produce clean snippets, while pubDate is converted to both a relative “time ago” string and ISO 8601 UTC timestamp. -
🖼️ Smart thumbnail capture
Fetches images via Open Graph or Twitter Card tags, with fallbacks to article content images. Valid images are encoded as base64 data URLs for portable use. -
🚦 De-duplication & multi-strategy harvesting
Prevents duplicates via GUID tracking and augments collection by trying multiple time-range strategies (e.g., day/week/month) when no specific time period is set. -
⚙️ Async performance & stability
Built on Python asyncio + aiohttp for speed, with per-request timeouts, rate limiting between requests, and up to 3 retries per proxy level to maximize success rates. -
📦 Real-time dataset writes
Items are saved incrementally during the run, so you can monitor results as they arrive and consume them from the run’s dataset stream.
How to use Google News Scraper - step by step
-
Create or log in to your Apify account
Access the actor from your Apify dashboard. -
Open Google News Scraper
Navigate to the “google-news-scraper” actor. -
Enter your input parameters
At minimum, provide query and maxItems. Optionally add gl, hl, lr, cr, time_period (and custom dates), nfpr, filter, and proxyConfiguration. -
Tune filters and locale
- Use gl (Google Country) and hl (UI Language) to localize results.
- Use lr and/or cr to limit results by language or country.
- Use time_period to constrain recency, including a custom date range.
-
Control result volume & behavior
- Set maxItems (100–5000).
- Toggle nfpr (exclude autocorrect) and filter (Similar/Omitted Results).
-
Start the run
The actor fetches Google News RSS data, applies retry logic and proxy fallback as needed, and writes items to the dataset in real time. -
Review and download your results
Open the run’s Dataset to view, filter, and export items as needed for your workflow.
Pro tip: For precise date windows, set time_period to custom and provide time_period_min/time_period_max in MM/DD/YYYY.
Use cases
| Use case name | Description |
|---|---|
| Media monitoring & alerts | Track breaking stories and publishers for your topics and brands with a real-time Google News scraper that saves structured articles continuously. |
| SEO & content planning | Identify trending topics and headlines to inform content calendars using consistent Google News headlines scraper output. |
| Competitive intelligence | Monitor competitors’ press coverage and announcements by filtering results with country/language parameters. |
| Market & financial tracking | Follow sector-specific news (e.g., “earnings”, “acquisition”) with time-based filters for last day/week. |
| Academic & policy research | Build structured corpora of articles for analysis using language-restricted results (lr) and region constraints (gl/cr). |
| Data pipelines & dashboards | Use the dataset output as a Google News API alternative to power dashboards and analytics without scraping browsers. |
Why choose Google News Scraper?
This production-ready Google News scraping tool combines precision, automation, and reliability.
- ✅ Accurate structured output with consistent fields (title, link, domain, source, date, snippet, thumbnail, positions).
- 🌐 Multilingual and multi-region support via gl, hl, lr, and cr parameters.
- 📈 Scales reliably with async requests, rate limiting, and up to 3 automatic retries per proxy level.
- 🧑💻 Developer-friendly dataset output ready for integrations and downstream processing.
- 🔐 Safe-by-design proxy fallback (none → datacenter → residential) to reduce blocks and keep runs stable.
- 🕒 Real-time saves to the dataset so long-running queries produce usable data immediately.
- 🧰 More robust than browser extensions or ad‑hoc scripts — built with aiohttp, BeautifulSoup, and clear retry logic.
Bottom line: if you need a dependable Google News scraping without API approach, this actor delivers consistent, clean results at scale.
Is it legal / ethical to use Google News Scraper?
Yes — when done responsibly. The actor processes publicly accessible Google News RSS content and does not access private or authenticated data.
Guidelines for compliant use:
- Respect platform terms and robots.txt directives.
- Avoid abusive behavior (high request rates, excessive retries).
- Use data for lawful purposes and follow applicable regulations (e.g., fair use).
- Attribute original publishers when required by your use case.
- Consult your legal team for edge cases and jurisdiction-specific requirements.
Input parameters & output format
Example JSON input
{"query": "Tesla","maxItems": 200,"gl": "United States","hl": "English","lr": "English","cr": "United States","time_period": "last_week","time_period_min": "03/01/2026","time_period_max": "03/31/2026","nfpr": 1,"filter": 1,"proxyConfiguration": {"useApifyProxy": false}}
Parameters
| Field | Type | Description | Default | Required |
|---|---|---|---|---|
| maxItems | integer | Maximum number of search results to retrieve (100–5000 enforced) | 100 | Yes |
| query | string | The search term to use | Elon Musk | Yes |
| gl | string | The Google country to use for the query | — | No |
| hl | string | The Google UI language to return results | — | No |
| lr | string | Limit the results to a specific language | — | No |
| cr | string | Limit the results to a specific country | — | No |
| time_period | string | Time period for results: last_hour, last_day, last_week, last_month, last_year, custom | — | No |
| time_period_min | string | Minimum date for custom time period (MM/DD/YYYY) | — | No |
| time_period_max | string | Maximum date for custom time period (MM/DD/YYYY) | — | No |
| nfpr | integer | Exclude results from auto-corrected queries (0 or 1) | 0 | No |
| filter | integer | Enable/disable Similar Results and Omitted Results filters (0 or 1) | 1 | No |
| proxyConfiguration | object | Configure proxy settings. The actor will start with no proxy, then fallback to datacenter, then residential proxies if needed. | {"useApifyProxy": false} | No |
Notes:
- If maxItems is set below 100, the actor automatically raises it to 100; above 5000, it caps at 5000.
- For time_period="custom", both time_period_min and time_period_max must be provided in MM/DD/YYYY format.
Example JSON output
{"position": 1,"title": "Tesla announces new factory plans in Mexico","link": "https://example.com/tesla-factory-plans","domain": "example.com","source": "Bloomberg","date": "2 hours ago","date_utc": "2026-03-15T10:30:00+00:00","snippet": "Tesla is planning a new manufacturing facility in Mexico...","thumbnail": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ...","block_position": 1}
Field notes:
- thumbnail may be empty if no suitable image is found or the image is not retrievable.
- date and date_utc are derived from the RSS pubDate; if parsing fails, the actor uses fallbacks.
FAQ
Is there a free trial or free tier?
Yes. This actor includes a 120-minute trial window in its current pricing plan, so you can evaluate it before subscribing.
Does it support Google News scraping with Python?
Yes. The actor is implemented in Python using asyncio and aiohttp, and produces structured dataset items suitable for downstream Python workflows.
How many results can it collect per run?
You can request between 100 and 5000 items via maxItems. The actor enforces this range for stability and performance.
Can I filter by language and country?
Yes. Use hl (UI language), lr (language-limited results), gl (Google country), and cr (country-limited results) to localize your results.
Can I filter by time range?
Yes. Set time_period to last_hour, last_day, last_week, last_month, last_year, or custom. For custom, provide time_period_min and time_period_max in MM/DD/YYYY format.
How does proxy handling work?
The actor starts with no proxy, then automatically falls back to datacenter proxies, and finally residential proxies if blocks or errors occur. It also retries requests up to three times per proxy level with backoff.
Does it de-duplicate results?
Yes. The actor uses item GUIDs from the RSS feed to avoid saving duplicate articles during a run.
What images are returned?
The actor attempts to fetch an article thumbnail by checking Open Graph and Twitter Card tags and then scanning suitable in-page images. Valid images are returned as base64 data URLs in the thumbnail field.
Is this a Google News API alternative?
For many use cases, yes. It provides structured article data from Google News RSS that you can use in pipelines and dashboards without relying on a separate API.
What do nfpr and filter options do?
- nfpr: Excludes results from auto-corrected queries when set to 1.
- filter: Enables (1) or disables (0) Google’s Similar/Omitted Results filters.
Closing CTA / Final thoughts
Google News Scraper is built for accurate, scalable collection of structured Google News data. With locale controls, flexible time filters, async performance, and robust proxy fallback, it provides dependable results for marketers, developers, analysts, and researchers. Configure your query, set maxItems and filters, and start capturing real-time news signals with clean titles, links, snippets, timestamps, and thumbnails. If you’re building a Google News scraping pipeline or seeking a Google News API alternative, this actor gives you production-ready, structured output to power your apps and analysis.