Ultimate News Scraper - Rise of the Phoenix avatar

Ultimate News Scraper - Rise of the Phoenix

Pricing

from $0.49 / 1,000 results

Go to Apify Store
Ultimate News Scraper - Rise of the Phoenix

Ultimate News Scraper - Rise of the Phoenix

Search a news archive by country, website, and publication date. Estimate result counts, fetch paginated historical articles, and export clean news datasets without running a live scrape.

Pricing

from $0.49 / 1,000 results

Rating

5.0

(1)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

2

Bookmarked

3

Total users

2

Monthly active users

4 days ago

Last modified

Share

The Rise of the Phoenix - Historical News Archive API

Search and export historical news articles from a growing global archive. The Rise of the Phoenix helps researchers, analysts, media teams, and AI data workflows find article records by country, publisher, and publication date, then download clean results from Apify.

This Actor is best for archive search and data export. It returns articles that are already available in the archive, so runs are fast, predictable, and easy to paginate.

What You Can Do

  • Search historical news articles by country or website.
  • Filter article data by publication date range.
  • Estimate result counts before exporting a large dataset.
  • Export article records to the Apify dataset in JSON, CSV, Excel, XML, RSS, or HTML.
  • Continue large exports with cursor-based pagination.
  • Use the results for media monitoring, news research, market intelligence, academic research, lead enrichment, and AI or LLM dataset preparation.

Common Use Cases

  • Monitor country-level news coverage over a date range.
  • Build a source-specific news article dataset.
  • Research historical coverage of public events, companies, markets, or regions.
  • Feed clean article text into analytics, NLP, summarization, classification, or retrieval workflows.
  • Estimate how much article data is available before launching a larger export.

How To Run The Actor

  1. Open the Actor on Apify.
  2. Choose either Countries or Websites.
  3. Set Published from and Published to.
  4. Optional: enable Estimate Results First to get a count before returning article rows.
  5. Set Max results for the number of records to return in this run.
  6. Run the Actor.
  7. Download article records from the Dataset and read pagination details from the OUTPUT record.

Input

The Actor input form contains the full list of available countries and websites. The main fields are:

FieldTypeRequiredDescription
countriesstring[]NoSearch one or more countries. Use this or websites, not both.
websitesstring[]NoSearch one or more news websites or publishers. Use this or countries, not both.
published_fromstringNoReturn articles published on or after this date. Supports ISO dates and relative values.
published_tostringNoReturn articles published on or before this date. Supports ISO dates and relative values.
estimate_onlybooleanNoEstimate how many matching article records are available without returning dataset rows.
max_resultsintegerNoMaximum number of article records to return in this run. Default is 10; maximum is 5000.
continuation_tokenstringNoToken from a previous run used to continue from the next page of results.

Helpful defaults:

  • If you choose neither Countries nor Websites, the Actor searches AP News.
  • If you provide no dates, the Actor searches the last 10 days.
  • If you provide only published_from, published_to defaults to 0 days.
  • If you provide only published_to, the Actor derives a one-day window ending at that date.

Accepted date examples:

  • 2025-01-01
  • 2025-01-01T00:00:00Z
  • 7 days
  • 30 days
  • 12 months
  • 2 years

Example Inputs

Search By Country

{
"countries": ["South Africa"],
"published_from": "30 days",
"published_to": "0 days",
"max_results": 100
}

Search By Website

{
"websites": ["AP News", "Reuters"],
"published_from": "2025-01-01",
"published_to": "2025-12-31",
"max_results": 100
}

Estimate Results Before Exporting

{
"countries": ["United States"],
"published_from": "12 months",
"published_to": "0 days",
"estimate_only": true,
"max_results": 500
}

Continue A Large Export

{
"countries": ["Africa"],
"published_from": "2 years",
"published_to": "0 days",
"max_results": 100,
"continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"
}

Output Data

When estimate_only is false, article records are pushed to the default Apify dataset. Common fields include:

FieldTypeDescription
site_namestringNews website or publisher name.
countrystringCountry associated with the source.
regionstringBroader region associated with the source.
languagestringSource language metadata.
article_titlestringArticle headline.
authorstring | nullAuthor or byline when available.
article_bodystringNormalized article text.
tagsstring[]Tags or keywords when available.
date_publishedstringISO 8601 publication timestamp.
article_urlstringCanonical article URL.
main_image_urlstring | nullFeatured image URL when available.
seo_descriptionstring | nullArticle summary or meta description when available.

Example dataset item:

{
"site_name": "AP News",
"country": "United States",
"region": "North America",
"language": "en",
"article_title": "Sample archived news headline",
"author": null,
"article_body": "Normalized article text appears here...",
"tags": [],
"date_published": "2026-05-06T16:13:00+00:00",
"article_url": "https://example.com/news/sample-article",
"main_image_url": null,
"seo_description": null
}

Output Summary And Pagination

Each run also writes an OUTPUT record with summary and pagination metadata:

  • resultCount
  • hasMore
  • nextContinuationToken
  • filters
  • estimatedMatchCount when estimate_only is enabled
  • estimatedReturnedThisRun when estimate_only is enabled

If hasMore is true, run the Actor again with the same filters and pass nextContinuationToken into continuation_token.

In estimate mode:

  • The dataset remains empty.
  • The OUTPUT record includes the estimated match count.
  • You can rerun the same input with estimate_only: false to fetch article rows.

Python API Example

Copy the Actor ID from the Actor's API tab and use it in ACTOR_ID.

import os
from apify_client import ApifyClient
client = ApifyClient(os.environ["APIFY_TOKEN"])
ACTOR_ID = "YOUR_USERNAME/Apify-The-Rise-of-the-Phoenix"
run_input = {
"countries": ["South Africa"],
"published_from": "30 days",
"published_to": "0 days",
"max_results": 25,
}
run = client.actor(ACTOR_ID).call(run_input=run_input)
dataset_items = list(
client.dataset(run["defaultDatasetId"]).iterate_items(clean=True)
)
output_record = client.key_value_store(
run["defaultKeyValueStoreId"]
).get_record("OUTPUT")
summary = output_record["value"] if output_record else {}
print("Run status:", run["status"])
print("Articles returned:", len(dataset_items))
print("Has more:", summary.get("hasMore"))
print("Next token:", summary.get("nextContinuationToken"))

To check volume before exporting rows, set estimate_only to true. The Actor will return the estimate in the OUTPUT record and leave the dataset empty.

Tips For Better Results

  • Use estimate_only before broad searches such as large countries, long date ranges, or global sources.
  • Use narrower date ranges when you need smaller, faster exports.
  • Use websites when you need publisher-specific news data.
  • Keep the same filters when using continuation_token; only the token should change between pages.
  • Increase max_results when you want fewer API calls, up to the Actor limit.

FAQ

Does this Actor scrape websites live during the run?

No. It searches the hosted article archive and returns matching article records. This makes it useful as a fast historical news API rather than a live crawling job.

Do I need to manage infrastructure to use it?

No. Run the Actor from Apify Console, tasks, schedules, or the Apify API. Provide your search filters and download the results from the dataset.

Why did I get zero dataset items?

The most common reasons are:

  • Your filters matched no archived articles.
  • Your date range is too narrow.
  • estimate_only was enabled.
  • Your continuation_token points beyond the available results.

How do I fetch more than one page of results?

Check the OUTPUT record. If hasMore is true, copy nextContinuationToken into the next run as continuation_token and keep the same country, website, and date filters.

What can I export?

You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML. Programmatic users can fetch the dataset through the Apify API.

Responsible Use

This Actor returns article data from a hosted archive. You are responsible for using the data in line with applicable laws, publisher terms, and your own compliance requirements.