Global News Archive API - Rise of the Phoenix
Pricing
from $0.39 / 1,000 results
Global News Archive API - Rise of the Phoenix
Search archived global news articles by country, publisher, and date. Export clean article text and metadata for media monitoring, PR research, market intelligence, RAG, and LLM workflows.
Pricing
from $0.39 / 1,000 results
Rating
5.0
(1)
Developer
Inus Grobler
Maintained by CommunityActor stats
2
Bookmarked
3
Total users
1
Monthly active users
9 hours ago
Last modified
Categories
Share
Search a hosted global news archive by country, publisher, and publication date. Export clean article text and metadata for media monitoring, PR research, market intelligence, compliance review, RAG, and LLM data workflows.
Rise of the Phoenix is a historical news API Actor for researchers, analysts, media teams, and AI builders who need article data without running a live crawl every time.
Because it searches a hosted archive, runs are fast, predictable, easy to paginate, and suitable for repeatable monitoring jobs or large historical exports.
Use it when you need a news API for:
- Media monitoring by country, publisher, or date range.
- Historical article exports for research, PR, reputation monitoring, or market intelligence.
- Clean news text datasets for RAG, LLM evaluation, NLP, classification, and summarization workflows.
- A fast first pass before deciding whether a broader live crawl is worth running.
Why Use This Actor
- Archive-first workflow: query stored article records instead of crawling pages live.
- Useful filters: narrow exports by country, publisher, and publication date.
- Estimate before export: check available volume before paying for a larger dataset.
- Pagination built in: continue broad exports with
nextContinuationToken. - Clean outputs: get article title, body, URL, source, country, language, image, and metadata fields in the Apify dataset.
Quick Starts
Monitor Recent AP News Articles
{"websites": ["AP News"],"published_from": "10 days","published_to": "0 days","max_results": 25}
Build A Country-Level News Dataset
{"countries": ["United States"],"published_from": "30 days","published_to": "0 days","max_results": 100}
Check Archive Volume Before Exporting
{"countries": ["South Africa"],"published_from": "12 months","published_to": "0 days","estimate_only": true,"max_results": 100}
Media Monitoring For A Market
{"countries": ["United Kingdom"],"published_from": "7 days","published_to": "0 days","max_results": 100}
Publisher-Specific Research
{"websites": ["Reuters", "AP News"],"published_from": "30 days","published_to": "0 days","max_results": 100}
What You Can Do
- Search historical news articles by country or website.
- Filter article data by publication date range.
- Estimate result counts before exporting a large dataset.
- Export article records to the Apify dataset in JSON, CSV, Excel, XML, RSS, or HTML.
- Continue large exports with cursor-based pagination.
- Use the results for media monitoring, news research, market intelligence, academic research, lead enrichment, and AI or LLM dataset preparation.
Common Use Cases
- Media monitoring: track recent coverage in a country or across selected publishers.
- PR and reputation research: export historical mentions and articles for review.
- Market intelligence: collect regional news context for companies, sectors, products, and public events.
- AI and RAG datasets: feed clean article text into analytics, NLP, summarization, classification, retrieval, and evaluation pipelines.
- Research scoping: estimate how much article data is available before launching a larger export.
Best For
- Users who need structured article records from an archive.
- Teams that want repeatable exports from the same source and date filters.
- Workflows where result count estimates and pagination matter.
- API users who want JSON, CSV, Excel, XML, RSS, or HTML exports from Apify datasets.
Not For
- Real-time breaking-news crawling from arbitrary websites.
- Scraping every page from a website outside the hosted archive.
- Bypassing publisher restrictions, paywalls, or access controls.
How To Run The Actor
- Open the Actor on Apify.
- Leave AP News selected for a quick test, or choose either Countries or Websites for a narrower export.
- Set Published from and Published to.
- Optional: enable Advanced: estimate results first to get a count before returning article rows.
- Set Max results for the number of records to return in this run.
- Run the Actor.
- Download article records from the Dataset and read pagination details from the OUTPUT record.
Input
The Actor input form contains the full list of available countries and websites. The main fields are:
| Field | Type | Required | Description |
|---|---|---|---|
countries | string[] | No | Search one or more countries. Use this or websites, not both. |
websites | string[] | No | Search one or more news websites or publishers. AP News is selected by default for a quick test. Use this or countries, not both. |
published_from | string | No | Return articles published on or after this date. Supports ISO dates and relative values. |
published_to | string | No | Return articles published on or before this date. Supports ISO dates and relative values. |
max_results | integer | No | Maximum number of article records to return in this run. Default is 10; maximum is 5000. |
Advanced fields:
| Field | Type | Required | Description |
|---|---|---|---|
estimate_only | boolean | No | Estimate how many matching article records are available without returning dataset rows. |
continuation_token | string | No | Token from a previous run used to continue from the next page of results. |
Helpful defaults:
AP Newsis selected by default. If you clear all countries and websites, the Actor still searchesAP News.- If you provide no dates, the Actor searches the last
10 days. - If you provide only
published_from,published_todefaults to0 days. - If you provide only
published_to, the Actor derives a one-day window ending at that date. - Duplicate countries or websites are ignored automatically.
- Invalid continuation tokens fail fast before the Actor queries the archive.
Accepted date examples:
2025-01-012025-01-01T00:00:00Z7 days30 days12 months2 years
Example Inputs
Search By Country
{"countries": ["South Africa"],"published_from": "30 days","published_to": "0 days","max_results": 100}
Search By Website
{"websites": ["AP News", "Reuters"],"published_from": "2025-01-01","published_to": "2025-12-31","max_results": 100}
Estimate Results Before Exporting
{"countries": ["United States"],"published_from": "12 months","published_to": "0 days","estimate_only": true,"max_results": 500}
Continue A Large Export
{"countries": ["Africa"],"published_from": "2 years","published_to": "0 days","max_results": 100,"continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"}
Output Data
When estimate_only is false, article records are pushed to the default Apify dataset while the run is still in progress. If a run stops early, already pushed records remain available in the dataset. Common fields include:
| Field | Type | Description |
|---|---|---|
site_name | string | News website or publisher name. |
country | string | Country associated with the source. |
region | string | Broader region associated with the source. |
language | string | Source language metadata. |
article_title | string | Article headline. |
author | string | null | Author or byline when available. |
article_body | string | Normalized article text. |
tags | string[] | Tags or keywords when available. |
date_published | string | ISO 8601 publication timestamp. |
article_url | string | Canonical article URL. |
main_image_url | string | null | Featured image URL when available. |
seo_description | string | null | Article summary or meta description when available. |
Example dataset item:
{"site_name": "AP News","country": "United States","region": "North America","language": "en","article_title": "Sample archived news headline","author": null,"article_body": "Normalized article text appears here...","tags": [],"date_published": "2026-05-06T16:13:00+00:00","article_url": "https://example.com/news/sample-article","main_image_url": null,"seo_description": null}
Output Summary And Pagination
Each run also writes an OUTPUT record with summary and pagination metadata:
resultCounthasMorenextContinuationTokenfiltersestimatedMatchCountwhenestimate_onlyis enabledestimatedReturnedThisRunwhenestimate_onlyis enabled
If hasMore is true, run the Actor again with the same filters and pass nextContinuationToken into continuation_token.
In estimate mode:
- The dataset remains empty.
- The
OUTPUTrecord includes the estimated match count. - You can rerun the same input with
estimate_only: falseto fetch article rows.
Python API Example
Use the Actor ID from the Actor's API tab.
import osfrom apify_client import ApifyClientclient = ApifyClient(os.environ["APIFY_TOKEN"])ACTOR_ID = "thescrapelab/Apify-The-Rise-of-the-Phoenix"run_input = {"countries": ["South Africa"],"published_from": "30 days","published_to": "0 days","max_results": 25,}run = client.actor(ACTOR_ID).call(run_input=run_input)dataset_items = list(client.dataset(run["defaultDatasetId"]).iterate_items(clean=True))output_record = client.key_value_store(run["defaultKeyValueStoreId"]).get_record("OUTPUT")summary = output_record["value"] if output_record else {}print("Run status:", run["status"])print("Articles returned:", len(dataset_items))print("Has more:", summary.get("hasMore"))print("Next cursor:", summary.get("nextContinuationToken"))
To check volume before exporting rows, set estimate_only to true. The Actor will return the estimate in the OUTPUT record and leave the dataset empty.
Tips For Better Results
- Use
estimate_onlybefore broad searches such as large countries, long date ranges, or global sources. - Use narrower date ranges when you need smaller, faster exports.
- Use
websiteswhen you need publisher-specific news data. - Keep the same filters when using
continuation_token; only the token should change between pages. - Increase
max_resultswhen you want fewer API calls, up to the Actor limit.
Pricing
This Actor uses pay-per-event pricing. You are charged a small Actor start event plus a per-result event for each dataset item returned. Estimate-only runs do not return dataset items, so they are useful for checking query size before exporting a large dataset.
Recommended usage:
- Run a small query first with
max_resultsbetween10and100. - Use
estimate_onlyfor broad archive searches. - Export larger pages with
max_resultsup to5000when the estimate looks right. - The measured default runtime is 128 MB memory with a 300-second timeout. The Actor is database-backed, so higher memory is rarely needed unless Apify support asks you to test it.
FAQ
Does this Actor scrape websites live during the run?
No. It searches the hosted article archive and returns matching article records. This makes it useful as a fast historical news API rather than a live crawling job.
Do I need to manage infrastructure to use it?
No. Run the Actor from Apify Console, tasks, schedules, or the Apify API. Provide your search filters and download the results from the dataset.
Why did I get zero dataset items?
The most common reasons are:
- Your filters matched no archived articles.
- Your date range is too narrow.
estimate_onlywas enabled.- Your
continuation_tokenpoints beyond the available results.
How do I fetch more than one page of results?
Check the OUTPUT record. If hasMore is true, copy nextContinuationToken into the next run as continuation_token and keep the same country, website, and date filters.
What can I export?
You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML. Programmatic users can fetch the dataset through the Apify API.
Are results streamed during the run?
Yes. Result batches are written to the default dataset as they are read from the archive, so partial results remain available if a run is interrupted.
Responsible Use
This Actor returns article data from a hosted archive. You are responsible for using the data in line with applicable laws, publisher terms, and your own compliance requirements.