Google News Scraper
Pricing
Pay per usage
Google News Scraper
Fast Google News scraper. Search by keyword with time-range filter, scrape top stories per country, topic sections, geo headlines, or topic IDs. Extracts title, publisher, date, snippet, image, related coverage. Optional decode of the wrapped Google URL into the real publisher URL.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Crikit
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Fast, reliable Google News scraper. Pulls news articles by keyword, country, topic section, local geo headlines, or direct Google News topic ID. Pure RSS read, no headless browser, no anti-bot stack.
What it does
Scrapes Google News for news articles and returns structured records with title, publisher, publish date, snippet, thumbnail, Google News URL, and clustered related coverage. Optionally resolves the wrapped Google News URL into the publisher's real article URL via Google's own batchexecute RPC.
Inputs
| Field | Type | Description |
|---|---|---|
queries | string[] | Keywords or phrases. Supports Google operators: site:reuters.com, intitle:, OR, -exclude, "exact phrase". |
topics | string[] | Editorial sections: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH. |
geos | string[] | Free-text city names like San Francisco, New York, Tokyo. Google resolves to a geo topic ID. |
topicIds | string[] | Direct Google News topic ID (CAAq... base64). |
includeTopStories | boolean | Also scrape each locale's Top Stories feed. |
locales | object[] | Country editions. Each entry is {hl, gl, ceid}. Default: US English. Examples: {hl: "fr-FR", gl: "FR", ceid: "FR:fr"}, {hl: "ja-JP", gl: "JP", ceid: "JP:ja"}. |
timeRange | enum | Restrict keyword searches to a window: 1h, 12h, 1d, 7d, 30d, 1y. Use any for no restriction. |
resolvePublisherUrls | boolean | When true, decodes each Google News URL into the publisher's real URL. Adds one HTTP request per article. |
maxResultsPerTarget | integer | Cap per single feed (one query × one locale). |
maxResults | integer | Hard cap across the entire run. |
maxConcurrency | integer | Number of feeds fetched in parallel. |
decodeConcurrency | integer | Concurrent URL decoder calls. |
proxyConfiguration | object | Apify proxy settings. Defaults to RESIDENTIAL US. |
You can mix any combination of queries, topics, geos, topicIds, and includeTopStories in a single run.
Outputs
Every record includes:
articleId- Google's stable CBM identifier for the articletitle- article title (publisher suffix stripped)googleNewsUrl- the wrapped Google News URLpublisherUrl- the publisher's real URL, whenresolvePublisherUrls=truepublisherName,publisherHomeUrl- publication detailspublishedAt- ISO-8601 UTCimageUrl- thumbnail URL (note: Google News RSS rarely emits per-item images; this field is usually null)snippetHtml- raw HTML snippetrelatedCoverage[]- other articles Google clustered under the same storyfeedKind,query,topic,topicId,geo,locale,scrapedAt
Example use cases
- PR firms tracking brand mentions across publishers
- Hedge funds running event-driven signal pipelines
- Market intelligence teams monitoring competitor coverage
- Sentiment analysis on a stream of topic-tagged news
- Local-news dashboards by city
- Country-by-country media coverage comparison
Performance and pricing
- Pure RSS reads: ~5 sec for 100 records on a single feed
- No JS execution, no headless browser, datacenter proxy sufficient (RESIDENTIAL default for safety margin)
- Pay per dataset item. Each successful record costs the same flat fee regardless of whether you used a query, topic, geo, or topic-id feed
- The
resolvePublisherUrlsflag adds an extra HTTP request per article and slows the run roughly 3x. Disable it for the cheapest, fastest scrape
Tips
- For high-volume keywords, splice by
timeRange(1h,1d,7d) and run the same query multiple times across windows. Each Google feed is hard-capped at 100 items - Fan a single query across multiple locales (
US:en,GB:en,FR:fr,JP:ja) to multiply your coverage per run - Geo topic IDs are stable. The first time you scrape
San Francisco, capture the topic ID from the run'sgoogleNewsUrland usetopicIdsdirectly in later runs to skip the redirect
Stack
Python + httpx + Apify SDK. No browser. Simple parser over Google's public RSS XML.