Ultimate News Scraper - Rise of the Phoenix
Pricing
from $0.49 / 1,000 results
Ultimate News Scraper - Rise of the Phoenix
Search a news archive by country, website, and publication date. Estimate result counts, fetch paginated historical articles, and export clean news datasets without running a live scrape.
Pricing
from $0.49 / 1,000 results
Rating
5.0
(1)
Developer
Inus Grobler
Maintained by CommunityActor stats
2
Bookmarked
3
Total users
2
Monthly active users
4 days ago
Last modified
Categories
Share
The Rise of the Phoenix - Historical News Archive API
Search and export historical news articles from a growing global archive. The Rise of the Phoenix helps researchers, analysts, media teams, and AI data workflows find article records by country, publisher, and publication date, then download clean results from Apify.
This Actor is best for archive search and data export. It returns articles that are already available in the archive, so runs are fast, predictable, and easy to paginate.
What You Can Do
- Search historical news articles by country or website.
- Filter article data by publication date range.
- Estimate result counts before exporting a large dataset.
- Export article records to the Apify dataset in JSON, CSV, Excel, XML, RSS, or HTML.
- Continue large exports with cursor-based pagination.
- Use the results for media monitoring, news research, market intelligence, academic research, lead enrichment, and AI or LLM dataset preparation.
Common Use Cases
- Monitor country-level news coverage over a date range.
- Build a source-specific news article dataset.
- Research historical coverage of public events, companies, markets, or regions.
- Feed clean article text into analytics, NLP, summarization, classification, or retrieval workflows.
- Estimate how much article data is available before launching a larger export.
How To Run The Actor
- Open the Actor on Apify.
- Choose either Countries or Websites.
- Set Published from and Published to.
- Optional: enable Estimate Results First to get a count before returning article rows.
- Set Max results for the number of records to return in this run.
- Run the Actor.
- Download article records from the Dataset and read pagination details from the OUTPUT record.
Input
The Actor input form contains the full list of available countries and websites. The main fields are:
| Field | Type | Required | Description |
|---|---|---|---|
countries | string[] | No | Search one or more countries. Use this or websites, not both. |
websites | string[] | No | Search one or more news websites or publishers. Use this or countries, not both. |
published_from | string | No | Return articles published on or after this date. Supports ISO dates and relative values. |
published_to | string | No | Return articles published on or before this date. Supports ISO dates and relative values. |
estimate_only | boolean | No | Estimate how many matching article records are available without returning dataset rows. |
max_results | integer | No | Maximum number of article records to return in this run. Default is 10; maximum is 5000. |
continuation_token | string | No | Token from a previous run used to continue from the next page of results. |
Helpful defaults:
- If you choose neither Countries nor Websites, the Actor searches
AP News. - If you provide no dates, the Actor searches the last
10 days. - If you provide only
published_from,published_todefaults to0 days. - If you provide only
published_to, the Actor derives a one-day window ending at that date.
Accepted date examples:
2025-01-012025-01-01T00:00:00Z7 days30 days12 months2 years
Example Inputs
Search By Country
{"countries": ["South Africa"],"published_from": "30 days","published_to": "0 days","max_results": 100}
Search By Website
{"websites": ["AP News", "Reuters"],"published_from": "2025-01-01","published_to": "2025-12-31","max_results": 100}
Estimate Results Before Exporting
{"countries": ["United States"],"published_from": "12 months","published_to": "0 days","estimate_only": true,"max_results": 500}
Continue A Large Export
{"countries": ["Africa"],"published_from": "2 years","published_to": "0 days","max_results": 100,"continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"}
Output Data
When estimate_only is false, article records are pushed to the default Apify dataset. Common fields include:
| Field | Type | Description |
|---|---|---|
site_name | string | News website or publisher name. |
country | string | Country associated with the source. |
region | string | Broader region associated with the source. |
language | string | Source language metadata. |
article_title | string | Article headline. |
author | string | null | Author or byline when available. |
article_body | string | Normalized article text. |
tags | string[] | Tags or keywords when available. |
date_published | string | ISO 8601 publication timestamp. |
article_url | string | Canonical article URL. |
main_image_url | string | null | Featured image URL when available. |
seo_description | string | null | Article summary or meta description when available. |
Example dataset item:
{"site_name": "AP News","country": "United States","region": "North America","language": "en","article_title": "Sample archived news headline","author": null,"article_body": "Normalized article text appears here...","tags": [],"date_published": "2026-05-06T16:13:00+00:00","article_url": "https://example.com/news/sample-article","main_image_url": null,"seo_description": null}
Output Summary And Pagination
Each run also writes an OUTPUT record with summary and pagination metadata:
resultCounthasMorenextContinuationTokenfiltersestimatedMatchCountwhenestimate_onlyis enabledestimatedReturnedThisRunwhenestimate_onlyis enabled
If hasMore is true, run the Actor again with the same filters and pass nextContinuationToken into continuation_token.
In estimate mode:
- The dataset remains empty.
- The
OUTPUTrecord includes the estimated match count. - You can rerun the same input with
estimate_only: falseto fetch article rows.
Python API Example
Copy the Actor ID from the Actor's API tab and use it in ACTOR_ID.
import osfrom apify_client import ApifyClientclient = ApifyClient(os.environ["APIFY_TOKEN"])ACTOR_ID = "YOUR_USERNAME/Apify-The-Rise-of-the-Phoenix"run_input = {"countries": ["South Africa"],"published_from": "30 days","published_to": "0 days","max_results": 25,}run = client.actor(ACTOR_ID).call(run_input=run_input)dataset_items = list(client.dataset(run["defaultDatasetId"]).iterate_items(clean=True))output_record = client.key_value_store(run["defaultKeyValueStoreId"]).get_record("OUTPUT")summary = output_record["value"] if output_record else {}print("Run status:", run["status"])print("Articles returned:", len(dataset_items))print("Has more:", summary.get("hasMore"))print("Next token:", summary.get("nextContinuationToken"))
To check volume before exporting rows, set estimate_only to true. The Actor will return the estimate in the OUTPUT record and leave the dataset empty.
Tips For Better Results
- Use
estimate_onlybefore broad searches such as large countries, long date ranges, or global sources. - Use narrower date ranges when you need smaller, faster exports.
- Use
websiteswhen you need publisher-specific news data. - Keep the same filters when using
continuation_token; only the token should change between pages. - Increase
max_resultswhen you want fewer API calls, up to the Actor limit.
FAQ
Does this Actor scrape websites live during the run?
No. It searches the hosted article archive and returns matching article records. This makes it useful as a fast historical news API rather than a live crawling job.
Do I need to manage infrastructure to use it?
No. Run the Actor from Apify Console, tasks, schedules, or the Apify API. Provide your search filters and download the results from the dataset.
Why did I get zero dataset items?
The most common reasons are:
- Your filters matched no archived articles.
- Your date range is too narrow.
estimate_onlywas enabled.- Your
continuation_tokenpoints beyond the available results.
How do I fetch more than one page of results?
Check the OUTPUT record. If hasMore is true, copy nextContinuationToken into the next run as continuation_token and keep the same country, website, and date filters.
What can I export?
You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML. Programmatic users can fetch the dataset through the Apify API.
Responsible Use
This Actor returns article data from a hosted archive. You are responsible for using the data in line with applicable laws, publisher terms, and your own compliance requirements.