Pricing

from $0.39 / 1,000 results

Try for free

Go to Apify Store

Global News Archive API - Rise of the Phoenix

Try for free

Search archived global news articles by country, publisher, and date. Export clean article text and metadata for media monitoring, PR research, market intelligence, RAG, and LLM workflows.

Pricing

from $0.39 / 1,000 results

Rating

5.0

(1)

Developer

Inus Grobler

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

Why Use This Actor

Archive-first workflow: query stored article records instead of crawling pages live.
Useful filters: narrow exports by country, publisher, and publication date.
Estimate before export: check available volume before paying for a larger dataset.
Pagination built in: continue broad exports with nextContinuationToken.
Clean outputs: get article title, body, URL, source, country, language, image, and metadata fields in the Apify dataset.
Daily LLM briefs: retrieve summaries already generated and stored by the scraper pipeline; the Actor does not make a new LLM request.

Quick Starts

Monitor Recent AP News Articles

{
  "websites": ["AP News"],
  "published_from": "10 days",
  "published_to": "0 days",
  "max_results": 25
}

Build A Country-Level News Dataset

{
  "countries": ["United States"],
  "published_from": "30 days",
  "published_to": "0 days",
  "max_results": 100
}

Pull Daily LLM Summaries

{
  "output_mode": "daily_summaries",
  "countries": ["South Africa", "Ghana"],
  "published_from": "7 days",
  "published_to": "0 days",
  "max_results": 100
}

Check Archive Volume Before Exporting

{
  "countries": ["South Africa"],
  "published_from": "12 months",
  "published_to": "0 days",
  "estimate_only": true,
  "max_results": 100
}

Media Monitoring For A Market

{
  "countries": ["United Kingdom"],
  "published_from": "7 days",
  "published_to": "0 days",
  "max_results": 100
}

Publisher-Specific Research

{
  "websites": ["Reuters", "AP News"],
  "published_from": "30 days",
  "published_to": "0 days",
  "max_results": 100
}

What You Can Do

Search historical news articles by country or website.
Pull persisted daily country/topic LLM summaries by country and date.
Filter article data by publication date range.
Estimate result counts before exporting a large dataset.
Export article records to the Apify dataset in JSON, CSV, Excel, XML, RSS, or HTML.
Continue large exports with cursor-based pagination.
Use the results for media monitoring, news research, market intelligence, academic research, lead enrichment, and AI or LLM dataset preparation.

Common Use Cases

Media monitoring: track recent coverage in a country or across selected publishers.
PR and reputation research: export historical mentions and articles for review.
Market intelligence: collect regional news context for companies, sectors, products, and public events.
AI and RAG datasets: feed clean article text into analytics, NLP, summarization, classification, retrieval, and evaluation pipelines.
Research scoping: estimate how much article data is available before launching a larger export.

Best For

Users who need structured article records from an archive.
Analysts who need generated daily briefs with links back to supporting articles.
Teams that want repeatable exports from the same source and date filters.
Workflows where result count estimates and pagination matter.
API users who want JSON, CSV, Excel, XML, RSS, or HTML exports from Apify datasets.

Not For

Real-time breaking-news crawling from arbitrary websites.
Scraping every page from a website outside the hosted archive.
Bypassing publisher restrictions, paywalls, or access controls.

Troubleshooting

If a country or publisher returns no rows, widen the date range or check that the selected country/source exists in the Actor input list.
If a large export stops before all available rows are returned, use nextContinuationToken from OUTPUT to continue.
If you only need a volume check, enable estimate_only before exporting article rows.
If you need current breaking-news crawling from arbitrary websites, use a live crawler instead of this archive API.

Pricing and Cost Notes

Keep public example tasks between 25 and 100 results so they are quick and affordable.
Use estimate_only to scope large country/date searches before exporting.
Increase max_results only when you need larger exports; broad searches can return many article rows.

How To Run The Actor

Open the Actor on Apify.
Choose Archived articles or Daily LLM summaries in Output mode.
For articles, leave AP News selected for a quick test or choose Countries/Websites. For daily summaries, optionally choose Countries; no country selection returns all countries in the bounded date window.
Set Date from and Date to.
Optional: enable Advanced: estimate results first to get a count without returning rows.
Set Max results, run the Actor, then download the Dataset and read pagination details from the OUTPUT record.

Input

The Actor input form contains the full list of available countries and websites. The main fields are:

Field	Type	Required	Description
`output_mode`	`string`	No	`articles` (default) or `daily_summaries`.
`countries`	`string[]`	No	Search one or more countries. Use this or `websites`, not both.
`websites`	`string[]`	No	Search publishers in article mode. Daily summaries are country-level and do not accept website filters.
`published_from`	`string`	No	Article publication start or summary-date start. Supports ISO dates and relative values.
`published_to`	`string`	No	Article publication end or summary-date end. Supports ISO dates and relative values.
`max_results`	`integer`	No	Maximum records to return. Default is `10`; maximum is `5000`.

Advanced fields:

Field	Type	Required	Description
`estimate_only`	`boolean`	No	Estimate matching records without returning dataset rows. Works in both modes.
`continuation_token`	`string`	No	Token from a previous run in the same output mode.

Helpful defaults:

AP News is selected by default. If you clear all countries and websites, the Actor still searches AP News.
In daily_summaries mode, the implicit AP News default is removed. An empty country selection returns all countries, still bounded by the date window and max_results.
If you provide no dates, the Actor searches the last 10 days.
If you provide only published_from, published_to defaults to 0 days.
If you provide only published_to, the Actor derives a one-day window ending at that date.
Duplicate countries or websites are ignored automatically.
Invalid or cross-mode continuation tokens fail fast before the Actor queries storage.

Accepted date examples:

2025-01-01
2025-01-01T00:00:00Z
7 days
30 days
12 months
2 years

Example Inputs

Search By Country

{
  "countries": ["South Africa"],
  "published_from": "30 days",
  "published_to": "0 days",
  "max_results": 100
}

Search By Website

{
  "websites": ["AP News", "Reuters"],
  "published_from": "2025-01-01",
  "published_to": "2025-12-31",
  "max_results": 100
}

Estimate Results Before Exporting

{
  "countries": ["United States"],
  "published_from": "12 months",
  "published_to": "0 days",
  "estimate_only": true,
  "max_results": 500
}

Continue A Large Export

{
  "countries": ["Africa"],
  "published_from": "2 years",
  "published_to": "0 days",
  "max_results": 100,
  "continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"
}

Pull Daily Summaries For A Date Range

{
  "output_mode": "daily_summaries",
  "countries": ["South Africa"],
  "published_from": "2026-07-01",
  "published_to": "2026-07-22",
  "max_results": 100
}

Output Data

When estimate_only is false, the selected record type is pushed to the default Apify dataset while the run is in progress. If a run stops early, already pushed records remain available. Every row has record_type set to article or daily_summary.

Archived Article Fields

Field	Type	Description
`site_name`	`string`	News website or publisher name.
`country`	`string`	Country associated with the source.
`region`	`string`	Broader region associated with the source.
`language`	`string`	Source language metadata.
`article_title`	`string`	Article headline.
`author`	`string \| null`	Author or byline when available.
`article_body`	`string`	Normalized article text.
`tags`	`string[]`	Tags or keywords when available.
`date_published`	`string`	ISO 8601 publication timestamp.
`article_url`	`string`	Canonical article URL.
`main_image_url`	`string \| null`	Featured image URL when available.
`seo_description`	`string \| null`	Article summary or meta description when available.
`execution_mode`	`string`	`current` for incremental scrapes or `historic` for archive backfills.
`summary_available`	`boolean`	Whether processed LLM enrichment is attached.
`short_summary`	`string \| null`	Concise grounded article summary.
`event_summary`	`string \| null`	Summary of the real-world event.
`bullet_summary`	`string[]`	Three concise summary bullets when enrichment is available.
`main_topic`	`string \| null`	Normalized primary topic.
`event_type`	`string \| null`	Normalized event classification.
`entities`	`object`	Extracted people, companies, organisations, and locations.
`keywords`	`string[]`	Grounded keywords extracted during enrichment.
`summary_confidence`	`number \| null`	Enrichment confidence from 0 to 1.

Example dataset item:

{
  "site_name": "AP News",
  "country": "United States",
  "region": "North America",
  "language": "en",
  "article_title": "Sample archived news headline",
  "author": null,
  "article_body": "Normalized article text appears here...",
  "tags": [],
  "date_published": "2026-05-06T16:13:00+00:00",
  "article_url": "https://example.com/news/sample-article",
  "main_image_url": null,
  "seo_description": null,
  "execution_mode": "current",
  "summary_available": true,
  "main_topic": "energy",
  "short_summary": "A concise grounded summary of the article.",
  "event_summary": "The article reports a material development in the energy sector.",
  "bullet_summary": ["Development announced.", "Stakeholders are affected.", "Further action is expected."],
  "summary_confidence": 0.91
}

Daily LLM Summary Fields

Daily summary mode returns one stored country/topic row per dataset item. It does not call OpenRouter or another model during the Actor run.

Field	Type	Description
`summary_date`	`string`	UTC country-day represented by the brief.
`country`	`string`	Country covered by the topic.
`topic_rank`	`integer`	Topic order within the country/day brief.
`topic_category`	`string`	Normalized topic category.
`topic_headline`	`string`	Brief headline.
`topic_summary`	`string`	Grounded country/topic summary.
`topic_importance_score`	`number`	Calibrated topic importance score from 0 to 100.
`key_points`	`string[]`	Concise facts supporting the brief.
`article_refs`	`object[]`	Supporting article IDs, URLs, sources, and timestamps.
`source_count`	`integer`	Distinct sources supporting the topic.
`article_count`	`integer`	Articles linked to the topic.
`input_article_count`	`integer`	Qualifying daily articles considered for the country.
`llm_model`	`string`	Model that generated the stored brief.
`generated_at`	`string`	Generation timestamp.

Example daily summary item:

{
  "record_type": "daily_summary",
  "summary_date": "2026-07-22",
  "country": "South Africa",
  "topic_rank": 1,
  "topic_category": "Infrastructure & Energy",
  "topic_headline": "Grid investment plan advances",
  "topic_summary": "Officials advanced a national grid investment plan. Implementation details remain the key uncertainty.",
  "topic_importance_score": 78,
  "source_count": 2,
  "article_count": 3,
  "llm_model": "qwen/qwen3-8b"
}

Output Summary And Pagination

Each run also writes an OUTPUT record with summary and pagination metadata:

resultCount
summarizedResultCount in article mode
hasMore
nextContinuationToken
filters
estimatedMatchCount when estimate_only is enabled
estimatedReturnedThisRun when estimate_only is enabled

If hasMore is true, run the Actor again with the same filters and pass nextContinuationToken into continuation_token.

In estimate mode:

The dataset remains empty.
The OUTPUT record includes the estimated match count.
You can rerun the same input with estimate_only: false to fetch rows in the selected mode.

Python API Example

Use the Actor ID from the Actor's API tab.

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

ACTOR_ID = "thescrapelab/Apify-The-Rise-of-the-Phoenix"

run_input = {
    "output_mode": "articles",
    "countries": ["South Africa"],
    "published_from": "30 days",
    "published_to": "0 days",
    "max_results": 25,
}

run = client.actor(ACTOR_ID).call(run_input=run_input)

dataset_items = list(
    client.dataset(run["defaultDatasetId"]).iterate_items(clean=True)
)

output_record = client.key_value_store(
    run["defaultKeyValueStoreId"]
).get_record("OUTPUT")
summary = output_record["value"] if output_record else {}

print("Run status:", run["status"])
print("Records returned:", len(dataset_items))
print("Has more:", summary.get("hasMore"))
print("Next cursor:", summary.get("nextContinuationToken"))

To check volume before exporting rows, set estimate_only to true. The Actor will return the estimate in the OUTPUT record and leave the dataset empty.

To pull daily briefs instead, use the same API flow with:

run_input = {
    "output_mode": "daily_summaries",
    "countries": ["South Africa"],
    "published_from": "7 days",
    "published_to": "0 days",
    "max_results": 100,
}

Tips For Better Results

Use estimate_only before broad searches such as large countries, long date ranges, or global sources.
Use narrower date ranges when you need smaller, faster exports.
Use websites when you need publisher-specific news data.
Use daily_summaries when you need stored country/day topics; no new LLM charge is incurred by retrieval.
Keep the same filters when using continuation_token; only the token should change between pages.
Increase max_results when you want fewer API calls, up to the Actor limit.

Pricing

This Actor uses pay-per-event pricing. You are charged a small Actor start event plus a per-result event for each dataset item returned. Estimate-only runs do not return dataset items, so they are useful for checking query size before exporting a large dataset.

Recommended usage:

Run a small query first with max_results between 10 and 100.
Use estimate_only for broad archive searches.
Export larger pages with max_results up to 5000 when the estimate looks right.
The measured default runtime is 128 MB memory with a 300-second timeout. The Actor is database-backed, so higher memory is rarely needed unless Apify support asks you to test it.

FAQ

Does this Actor scrape websites live during the run?

No. It reads matching articles or daily summaries from hosted storage. This makes it useful as a fast historical news API rather than a live crawling job.

Does daily-summary mode generate new summaries?

No. It pulls country/topic summaries already generated by the scraper's scheduled LLM pipeline. A missing country/day therefore returns no summary rows instead of generating them on demand.

Do I need to manage infrastructure to use it?

No. Run the Actor from Apify Console, tasks, schedules, or the Apify API. Provide your search filters and download the results from the dataset.

Why did I get zero dataset items?

The most common reasons are:

Your filters matched no archived articles or stored daily summaries.
Your date range is too narrow.
estimate_only was enabled.
Your continuation_token points beyond the available results.

How do I fetch more than one page of results?

Check the OUTPUT record. If hasMore is true, copy nextContinuationToken into the next run as continuation_token and keep the same output mode and filters.

What can I export?

You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML. Programmatic users can fetch the dataset through the Apify API.

Are results streamed during the run?

Yes. Result batches are written to the default dataset as they are read from the archive, so partial results remain available if a run is interrupted.

Responsible Use

This Actor returns article data and derived summaries from hosted storage. You are responsible for using the data in line with applicable laws, publisher terms, and your own compliance requirements.

Google News Article Scraper

technicaldost/google-news-scraper

Search Google News by topic and export structured articles: title, source, publish date, URL and snippet. For media monitoring, PR and research.

Technical Dost Solutions

Google News Scraper API | PR & Sentiment Monitoring

andok/google-news-scraper

Instantly scrape recent news articles and headlines by keyword from Google News. Automate your media monitoring and PR tracking.

Andok

Google News Scraper | PR & Sentiment Monitoring

andok/google-news-scraper-pr-sentiment-monitoring

Instantly scrape recent news articles and headlines by keyword from Google News. Automate your media monitoring and PR tracking.

Andok

GDELT Global News Search Scraper

automation-lab/gdelt-global-news-search-scraper

🌍 Search worldwide news with GDELT and export clean, deduplicated article metadata for media monitoring, research, alerts, and RAG pipelines.

Stas Persiianenko

Google News Scraper - Articles by Keyword & Source

benthepythondev/google-news-scraper

Search Google News by keyword and get structured articles: title, publisher, date, link and snippet, for any topic, language and country. Powered by Google's public News RSS, so it's fast and reliable. Great for brand monitoring, PR, market research and content aggregation.

Ben

Google News Search Scraper

bgfc97/google-news-search-scraper

Search Google News by keyword and get articles: title, source, date, link and snippet. Uses Apify Proxy. For media monitoring, PR and research.

Bruno

GDELT Global News Article Scraper

chrisp1211/gdelt-news-scraper-max

Monitor global news across 100+ languages with GDELT. Search articles by keyword, language, country and time window. Returns title, URL, domain, source country and date. No API key. Pay per article; empty runs free.

Christian Pichichero

Google News Scraper — Real Publisher URLs, MCP-Ready

paulovitor18/google-news-scraper-real-urls

Get the real publisher URL for every Google News article — not the opaque news.google.com redirect. Fast RSS-based (no browser); search by keyword, topic, language & country. Built for news monitoring, LLM/RAG pipelines and AI agents (MCP).

MoreLock

GDELT Global News Monitoring Scraper

scrapers_lat/gdelt-news-events-scraper

Monitor worldwide news coverage by keyword, country and language. Scrape matching articles with title, link, source domain, source country, language, publish time and lead image. Great for brand and PR monitoring, media research and geo signals. Export to JSON, CSV or Excel.

Scrapers Lat

News Archive Scraper

quarterly_jingo/news-archive-scraper

Petey Boy

Global News Archive API - Rise of the Phoenix

Why Use This Actor

Quick Starts

Monitor Recent AP News Articles

Build A Country-Level News Dataset

Pull Daily LLM Summaries

Check Archive Volume Before Exporting

Media Monitoring For A Market

Publisher-Specific Research

What You Can Do

Common Use Cases

Best For

Not For

Troubleshooting

Pricing and Cost Notes

How To Run The Actor

Input

Example Inputs

Search By Country

Search By Website

Estimate Results Before Exporting

Continue A Large Export

Pull Daily Summaries For A Date Range

Output Data

Archived Article Fields

Daily LLM Summary Fields

Output Summary And Pagination

Python API Example

Tips For Better Results

Pricing

FAQ

Does this Actor scrape websites live during the run?

Does daily-summary mode generate new summaries?

Do I need to manage infrastructure to use it?

Why did I get zero dataset items?

How do I fetch more than one page of results?

What can I export?

Are results streamed during the run?

Responsible Use

You might also like

Google News Article Scraper

Google News Scraper API | PR & Sentiment Monitoring

Google News Scraper | PR & Sentiment Monitoring

GDELT Global News Search Scraper

Google News Scraper - Articles by Keyword & Source

Google News Search Scraper

GDELT Global News Article Scraper

Google News Scraper — Real Publisher URLs, MCP-Ready

GDELT Global News Monitoring Scraper

News Archive Scraper