Google News Scraper - Real URLs & Archives avatar

Google News Scraper - Real URLs & Archives

Pricing

$0.70 / 1,000 results

Go to Apify Store
Google News Scraper - Real URLs & Archives

Google News Scraper - Real URLs & Archives

๐Ÿ’ฐ$0.70/1K result๐Ÿ’ฐ Fast Google News scraper for keywords, topics, publications, and URLs. Get titles, dates, publishers, domains, thumbnails, Google News IDs, optional real publisher URLs, locale presets, filters, dedupe, ticker/entity signals, and archives up to 50K results.

Pricing

$0.70 / 1,000 results

Rating

5.0

(1)

Developer

VortexData

VortexData

Maintained by Community

Actor stats

1

Bookmarked

9

Total users

7

Monthly active users

2 days ago

Last modified

Share

๐Ÿ“ฐ Google News Scraper

Scrape Google News search results, topics, publications, and Google News URLs into clean, structured datasets.

Use this actor for media monitoring, brand tracking, market intelligence, SEO research, financial news signals, and AI/RAG news pipelines. It returns article titles, RSS descriptions, publishers, source domains, publication dates, thumbnails, stable Google News IDs, Google News URLs, optional real publisher URLs, locale metadata, and ticker/entity signals.

โœ… No browser automation, no Google account, no Google API key, and no proxy required for normal runs.

โœจ Why Use It

NeedWhat this actor gives you
๐Ÿ‘€ Monitor brands, people, products, or topicsSearch one or many keywords across Google News.
๐Ÿ”— Get real article URLsResolve Google News redirects into direct publisher URLs when decodeUrls is enabled.
๐Ÿ“š Scrape more than 100 resultsLarge keyword requests are split by date and deduplicated by Google News article ID.
๐ŸŒ Work across countries and languagesChoose from 49 country/language presets such as US:en, DE:de, FR:fr, JP:ja, and BR:pt-419.
๐Ÿ“Š Export clean dataDataset rows are flat and CSV-friendly, with fields like title, sourceDomain, publishedAt, publisherUrl, and imageUrl.
๐ŸŽฏ Keep results focusedInclude or exclude publisher domains such as reuters.com, bbc.com, msn.com, or yahoo.com.
๐Ÿ’ธ Reduce unnecessary proxy costRuns direct first by default and uses Apify Proxy only as fallback when configured through API options.

๐Ÿ”Ž What You Can Scrape

  • ๐Ÿ” Keyword searches, including Google operators such as "exact phrase", OR, -exclude, site:domain.com, and intitle:term
  • ๐Ÿ—‚๏ธ Google News topics such as Business, Technology, Sports, Health, Science, World, and Entertainment
  • ๐Ÿ”— Google News topic, publication, section, search, and RSS URLs
  • ๐ŸŒ Localized editions of Google News by country and language
  • ๐Ÿ—„๏ธ Recent monitoring windows or larger historical archives

โšก Quick Start

  1. Open the actor on Apify and go to the Input tab.
  2. Enter one or more search terms in Search keywords.
  3. Choose Country and language.
  4. Set Articles per search.
  5. Keep Real publisher URLs on if you need direct article links.
  6. Run the actor and export the dataset as JSON, CSV, Excel, HTML, XML, or RSS.

Example input:

{
"keywords": ["OpenAI", "Anthropic"],
"maxArticles": 50,
"timeframe": "1d",
"region_language": "US:en",
"decodeUrls": true,
"extractImages": true
}

๐Ÿ’ผ Common Use Cases

๐Ÿ‘€ Brand Monitoring

Track brand mentions, executives, products, lawsuits, partnerships, or incidents.

{
"keywords": ["Acme Corp", "Acme CEO", "\"Acme\" lawsuit"],
"timeframe": "1d",
"maxArticles": 100,
"excludeDomains": ["msn.com", "yahoo.com"]
}

๐Ÿ“ˆ Market And Competitor Research

Collect news around companies, industries, funding rounds, regulations, or competitors.

{
"keywords": ["AI chip market", "NVIDIA OR AMD", "semiconductor supply chain"],
"timeframe": "7d",
"region_language": "US:en",
"maxArticles": 200,
"decodeUrls": true
}

๐ŸŒ Localized News Tracking

Get results from a specific country and language edition of Google News.

{
"keywords": ["election"],
"region_language": "FR:fr",
"timeframe": "7d",
"maxArticles": 100
}

๐Ÿ—‚๏ธ Topic Feeds

Scrape top stories from Google News topic feeds.

{
"topics": ["TECHNOLOGY", "BUSINESS"],
"region_language": "GB:en",
"maxArticles": 50
}

๐Ÿ—„๏ธ Historical Archive

Request more than a single Google News RSS feed usually returns by using a larger article limit.

{
"query": "semiconductor supply chain",
"dateFrom": "2025-01-01",
"dateTo": "2025-12-31",
"maxArticles": 5000,
"region_language": "US:en"
}

๐Ÿ“ฆ Output

Each dataset item is one Google News result.

{
"position": 1,
"sourcePosition": 3,
"keyword": "OpenAI",
"title": "OpenAI announces new enterprise AI tools",
"description": "OpenAI announces new enterprise AI tools - Example News",
"source": "Example News",
"sourceUrl": "https://www.example.com",
"sourceDomain": "example.com",
"url": "https://www.example.com/openai-enterprise-ai-tools",
"publisherUrl": "https://www.example.com/openai-enterprise-ai-tools",
"googleNewsUrl": "https://news.google.com/rss/articles/CBMi...",
"rssLink": "https://news.google.com/rss/articles/CBMi...",
"guid": "CBMi...",
"articleId": "CBMi...",
"publishedAt": "2026-05-29T08:30:00.000Z",
"publishedTimestamp": 1780043400000,
"image": "https://news.google.com/api/attachments/CC8i...",
"imageUrl": "https://news.google.com/api/attachments/CC8i...",
"sourceType": "keyword",
"tickers": ["OPENAI"],
"entities": [
{"ticker": "OPENAI", "name": "OpenAI", "type": "private"}
],
"metadata": {
"scrapeTimestamp": "2026-05-29T08:31:02.123Z",
"sourceType": "keyword",
"timeframe": "1d",
"region": "US",
"language": "en",
"feedUrl": "https://news.google.com/rss/search?q=OpenAI..."
}
}

๐Ÿ“Œ Important output fields:

FieldMeaning
titleArticle headline from Google News.
descriptionPlain-text RSS description when Google News provides it.
sourcePublisher name.
sourceDomainNormalized publisher domain.
publishedAtPublication time in UTC.
urlBest available article URL. If URL decoding succeeds, this is the publisher URL; otherwise it is the Google News URL.
publisherUrlDirect publisher article URL when decodeUrls is enabled and decoding succeeds.
googleNewsUrlOriginal Google News result URL.
imageUrlGoogle News thumbnail URL when available.
guid / articleIdStable Google News identifiers, useful for deduplication.
tickers / entitiesStock, crypto, and company signals detected from headline and RSS description.

๐Ÿงญ Input Reference

Most users only need these fields:

FieldDefaultDescription
keywords[]Search terms. Add one query per line.
topics[]Optional Google News topic feeds.
topicUrls[]Optional Google News URLs pasted from your browser.
timeframe1dSearch window: 1h, 1d, 7d, 1m, 1y, or all.
dateFrom / dateTo-Optional exact date range. Format: YYYY-MM-DD.
region_languageUS:enCountry and language preset.
maxArticles10Number of articles to return per keyword, topic, or URL.
decodeUrlstrueResolve Google News links to real publisher URLs. Turn off for the fastest RSS-only runs.
extractImagestrueAdd thumbnails when Google News exposes them.
extractEntitiestrueExtract ticker and company signals.
includeDomains[]Return only these publisher domains.
excludeDomains[]Remove these publisher domains.

๐Ÿ› ๏ธ Advanced API-only fields are also supported for automation and compatibility: query, q, searchQuery, queries, startUrls, maxResults, numberOfResults, limit, maxItems, country, language, gl, hl, lr, cr, nfpr, filter, deduplicate, proxyMode, proxyConfiguration, and concurrency/retry tuning.

โšก Speed And Cost

๐Ÿ’จ For the fastest and cheapest runs, turn decodeUrls off. RSS-only mode is enough if Google News URLs are acceptable.

๐Ÿ”— Keep decodeUrls on when you need real publisher links. URL decoding adds extra Google News requests per article, so it is slower than RSS-only mode.

๐Ÿ–ผ๏ธ extractImages is usually worth keeping on. It uses Google News thumbnail data and a feed-level image index, not one publisher-page request per article. Some Google News results do not expose thumbnails, so imageUrl can still be null.

๐Ÿ’ธ By default, networking is direct-first. Proxy settings are hidden from the visual UI to avoid accidental proxy spend. API users can still set proxyMode to direct, auto, or apify.

โš ๏ธ Notes And Limitations

  • ๐Ÿ“ This actor does not extract full article text. It focuses on stable Google News data: headline, RSS description, publisher, date, source domain, image, Google News URL, and optional resolved publisher URL.
  • ๐Ÿ–ผ๏ธ Some articles do not have thumbnails in Google News.
  • ๐Ÿ”— Publisher URL decoding depends on Google News redirect/decode behavior and can occasionally fail. When that happens, the item is still returned with the Google News URL.
  • ๐ŸŒ Google News ranking and availability vary by country, language, query, and time.
  • ๐Ÿ—„๏ธ Very large archive runs depend on how many unique results Google News exposes for the query and date range.

โ“ FAQ

Yes. Keep decodeUrls enabled. The actor will resolve Google News redirects into publisher article URLs when possible.

๐Ÿ“š Can I scrape more than 100 Google News results?

Yes. Set maxArticles above 100. The actor splits large keyword requests by date and deduplicates results by Google News ID.

๐ŸŒ Can I scrape Google News in different countries and languages?

Yes. Use region_language, for example US:en, DE:de, FR:fr, JP:ja, or BR:pt-419.

๐Ÿ’ธ Does it use Apify Proxy?

Normal runs are direct-first and do not require a proxy. Advanced API users can enable Apify Proxy fallback or force proxy mode if needed.

๐Ÿ“ Does it extract full article content?

No. Full publisher-page extraction is slower, less reliable across paywalls and consent pages, and often produces sparse or inconsistent records. This actor is optimized for reliable Google News result data.

โš–๏ธ Is this actor affiliated with Google?

No. This actor is not affiliated with Google. Users are responsible for using the data lawfully and respecting Google News and publisher terms.

๐Ÿ›Ÿ Need help?

Open the Issues tab on the actor page and include your input JSON, run ID, expected result, and one example of missing or incorrect data.