Global News Archive API - Rise of the Phoenix avatar

Global News Archive API - Rise of the Phoenix

Pricing

from $0.39 / 1,000 results

Go to Apify Store
Global News Archive API - Rise of the Phoenix

Global News Archive API - Rise of the Phoenix

Search archived global news articles by country, publisher, and date. Export clean article text and metadata for media monitoring, PR research, market intelligence, RAG, and LLM workflows.

Pricing

from $0.39 / 1,000 results

Rating

5.0

(1)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

2

Bookmarked

3

Total users

1

Monthly active users

9 hours ago

Last modified

Share

Search a hosted global news archive by country, publisher, and publication date. Export clean article text and metadata for media monitoring, PR research, market intelligence, compliance review, RAG, and LLM data workflows.

Rise of the Phoenix is a historical news API Actor for researchers, analysts, media teams, and AI builders who need article data without running a live crawl every time.

Because it searches a hosted archive, runs are fast, predictable, easy to paginate, and suitable for repeatable monitoring jobs or large historical exports.

Use it when you need a news API for:

  • Media monitoring by country, publisher, or date range.
  • Historical article exports for research, PR, reputation monitoring, or market intelligence.
  • Clean news text datasets for RAG, LLM evaluation, NLP, classification, and summarization workflows.
  • A fast first pass before deciding whether a broader live crawl is worth running.

Why Use This Actor

  • Archive-first workflow: query stored article records instead of crawling pages live.
  • Useful filters: narrow exports by country, publisher, and publication date.
  • Estimate before export: check available volume before paying for a larger dataset.
  • Pagination built in: continue broad exports with nextContinuationToken.
  • Clean outputs: get article title, body, URL, source, country, language, image, and metadata fields in the Apify dataset.

Quick Starts

Monitor Recent AP News Articles

{
"websites": ["AP News"],
"published_from": "10 days",
"published_to": "0 days",
"max_results": 25
}

Build A Country-Level News Dataset

{
"countries": ["United States"],
"published_from": "30 days",
"published_to": "0 days",
"max_results": 100
}

Check Archive Volume Before Exporting

{
"countries": ["South Africa"],
"published_from": "12 months",
"published_to": "0 days",
"estimate_only": true,
"max_results": 100
}

Media Monitoring For A Market

{
"countries": ["United Kingdom"],
"published_from": "7 days",
"published_to": "0 days",
"max_results": 100
}

Publisher-Specific Research

{
"websites": ["Reuters", "AP News"],
"published_from": "30 days",
"published_to": "0 days",
"max_results": 100
}

What You Can Do

  • Search historical news articles by country or website.
  • Filter article data by publication date range.
  • Estimate result counts before exporting a large dataset.
  • Export article records to the Apify dataset in JSON, CSV, Excel, XML, RSS, or HTML.
  • Continue large exports with cursor-based pagination.
  • Use the results for media monitoring, news research, market intelligence, academic research, lead enrichment, and AI or LLM dataset preparation.

Common Use Cases

  • Media monitoring: track recent coverage in a country or across selected publishers.
  • PR and reputation research: export historical mentions and articles for review.
  • Market intelligence: collect regional news context for companies, sectors, products, and public events.
  • AI and RAG datasets: feed clean article text into analytics, NLP, summarization, classification, retrieval, and evaluation pipelines.
  • Research scoping: estimate how much article data is available before launching a larger export.

Best For

  • Users who need structured article records from an archive.
  • Teams that want repeatable exports from the same source and date filters.
  • Workflows where result count estimates and pagination matter.
  • API users who want JSON, CSV, Excel, XML, RSS, or HTML exports from Apify datasets.

Not For

  • Real-time breaking-news crawling from arbitrary websites.
  • Scraping every page from a website outside the hosted archive.
  • Bypassing publisher restrictions, paywalls, or access controls.

How To Run The Actor

  1. Open the Actor on Apify.
  2. Leave AP News selected for a quick test, or choose either Countries or Websites for a narrower export.
  3. Set Published from and Published to.
  4. Optional: enable Advanced: estimate results first to get a count before returning article rows.
  5. Set Max results for the number of records to return in this run.
  6. Run the Actor.
  7. Download article records from the Dataset and read pagination details from the OUTPUT record.

Input

The Actor input form contains the full list of available countries and websites. The main fields are:

FieldTypeRequiredDescription
countriesstring[]NoSearch one or more countries. Use this or websites, not both.
websitesstring[]NoSearch one or more news websites or publishers. AP News is selected by default for a quick test. Use this or countries, not both.
published_fromstringNoReturn articles published on or after this date. Supports ISO dates and relative values.
published_tostringNoReturn articles published on or before this date. Supports ISO dates and relative values.
max_resultsintegerNoMaximum number of article records to return in this run. Default is 10; maximum is 5000.

Advanced fields:

FieldTypeRequiredDescription
estimate_onlybooleanNoEstimate how many matching article records are available without returning dataset rows.
continuation_tokenstringNoToken from a previous run used to continue from the next page of results.

Helpful defaults:

  • AP News is selected by default. If you clear all countries and websites, the Actor still searches AP News.
  • If you provide no dates, the Actor searches the last 10 days.
  • If you provide only published_from, published_to defaults to 0 days.
  • If you provide only published_to, the Actor derives a one-day window ending at that date.
  • Duplicate countries or websites are ignored automatically.
  • Invalid continuation tokens fail fast before the Actor queries the archive.

Accepted date examples:

  • 2025-01-01
  • 2025-01-01T00:00:00Z
  • 7 days
  • 30 days
  • 12 months
  • 2 years

Example Inputs

Search By Country

{
"countries": ["South Africa"],
"published_from": "30 days",
"published_to": "0 days",
"max_results": 100
}

Search By Website

{
"websites": ["AP News", "Reuters"],
"published_from": "2025-01-01",
"published_to": "2025-12-31",
"max_results": 100
}

Estimate Results Before Exporting

{
"countries": ["United States"],
"published_from": "12 months",
"published_to": "0 days",
"estimate_only": true,
"max_results": 500
}

Continue A Large Export

{
"countries": ["Africa"],
"published_from": "2 years",
"published_to": "0 days",
"max_results": 100,
"continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"
}

Output Data

When estimate_only is false, article records are pushed to the default Apify dataset while the run is still in progress. If a run stops early, already pushed records remain available in the dataset. Common fields include:

FieldTypeDescription
site_namestringNews website or publisher name.
countrystringCountry associated with the source.
regionstringBroader region associated with the source.
languagestringSource language metadata.
article_titlestringArticle headline.
authorstring | nullAuthor or byline when available.
article_bodystringNormalized article text.
tagsstring[]Tags or keywords when available.
date_publishedstringISO 8601 publication timestamp.
article_urlstringCanonical article URL.
main_image_urlstring | nullFeatured image URL when available.
seo_descriptionstring | nullArticle summary or meta description when available.

Example dataset item:

{
"site_name": "AP News",
"country": "United States",
"region": "North America",
"language": "en",
"article_title": "Sample archived news headline",
"author": null,
"article_body": "Normalized article text appears here...",
"tags": [],
"date_published": "2026-05-06T16:13:00+00:00",
"article_url": "https://example.com/news/sample-article",
"main_image_url": null,
"seo_description": null
}

Output Summary And Pagination

Each run also writes an OUTPUT record with summary and pagination metadata:

  • resultCount
  • hasMore
  • nextContinuationToken
  • filters
  • estimatedMatchCount when estimate_only is enabled
  • estimatedReturnedThisRun when estimate_only is enabled

If hasMore is true, run the Actor again with the same filters and pass nextContinuationToken into continuation_token.

In estimate mode:

  • The dataset remains empty.
  • The OUTPUT record includes the estimated match count.
  • You can rerun the same input with estimate_only: false to fetch article rows.

Python API Example

Use the Actor ID from the Actor's API tab.

import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_TOKEN"])
ACTOR_ID = "thescrapelab/Apify-The-Rise-of-the-Phoenix"
run_input = {
"countries": ["South Africa"],
"published_from": "30 days",
"published_to": "0 days",
"max_results": 25,
}
run = client.actor(ACTOR_ID).call(run_input=run_input)
dataset_items = list(
client.dataset(run["defaultDatasetId"]).iterate_items(clean=True)
)
output_record = client.key_value_store(
run["defaultKeyValueStoreId"]
).get_record("OUTPUT")
summary = output_record["value"] if output_record else {}
print("Run status:", run["status"])
print("Articles returned:", len(dataset_items))
print("Has more:", summary.get("hasMore"))
print("Next cursor:", summary.get("nextContinuationToken"))

To check volume before exporting rows, set estimate_only to true. The Actor will return the estimate in the OUTPUT record and leave the dataset empty.

Tips For Better Results

  • Use estimate_only before broad searches such as large countries, long date ranges, or global sources.
  • Use narrower date ranges when you need smaller, faster exports.
  • Use websites when you need publisher-specific news data.
  • Keep the same filters when using continuation_token; only the token should change between pages.
  • Increase max_results when you want fewer API calls, up to the Actor limit.

Pricing

This Actor uses pay-per-event pricing. You are charged a small Actor start event plus a per-result event for each dataset item returned. Estimate-only runs do not return dataset items, so they are useful for checking query size before exporting a large dataset.

Recommended usage:

  • Run a small query first with max_results between 10 and 100.
  • Use estimate_only for broad archive searches.
  • Export larger pages with max_results up to 5000 when the estimate looks right.
  • The measured default runtime is 128 MB memory with a 300-second timeout. The Actor is database-backed, so higher memory is rarely needed unless Apify support asks you to test it.

FAQ

Does this Actor scrape websites live during the run?

No. It searches the hosted article archive and returns matching article records. This makes it useful as a fast historical news API rather than a live crawling job.

Do I need to manage infrastructure to use it?

No. Run the Actor from Apify Console, tasks, schedules, or the Apify API. Provide your search filters and download the results from the dataset.

Why did I get zero dataset items?

The most common reasons are:

  • Your filters matched no archived articles.
  • Your date range is too narrow.
  • estimate_only was enabled.
  • Your continuation_token points beyond the available results.

How do I fetch more than one page of results?

Check the OUTPUT record. If hasMore is true, copy nextContinuationToken into the next run as continuation_token and keep the same country, website, and date filters.

What can I export?

You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML. Programmatic users can fetch the dataset through the Apify API.

Are results streamed during the run?

Yes. Result batches are written to the default dataset as they are read from the archive, so partial results remain available if a run is interrupted.

Responsible Use

This Actor returns article data from a hosted archive. You are responsible for using the data in line with applicable laws, publisher terms, and your own compliance requirements.