Pricing

from $0.49 / 1,000 results

Try for free

Go to Apify Store

Ultimate News Scraper - Rise of the Phoenix

Try for free

Search a news archive by country, website, and publication date. Estimate result counts, fetch paginated historical articles, and export clean news datasets without running a live scrape.

Pricing

from $0.49 / 1,000 results

Rating

5.0

(1)

Developer

Inus Grobler

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

The Rise of the Phoenix - Historical News Archive API

Search and export historical news articles from a growing global archive. The Rise of the Phoenix helps researchers, analysts, media teams, and AI data workflows find article records by country, publisher, and publication date, then download clean results from Apify.

This Actor is best for archive search and data export. It returns articles that are already available in the archive, so runs are fast, predictable, and easy to paginate.

What You Can Do

Search historical news articles by country or website.
Filter article data by publication date range.
Estimate result counts before exporting a large dataset.
Export article records to the Apify dataset in JSON, CSV, Excel, XML, RSS, or HTML.
Continue large exports with cursor-based pagination.
Use the results for media monitoring, news research, market intelligence, academic research, lead enrichment, and AI or LLM dataset preparation.

Common Use Cases

Monitor country-level news coverage over a date range.
Build a source-specific news article dataset.
Research historical coverage of public events, companies, markets, or regions.
Feed clean article text into analytics, NLP, summarization, classification, or retrieval workflows.
Estimate how much article data is available before launching a larger export.

How To Run The Actor

Open the Actor on Apify.
Choose either Countries or Websites.
Set Published from and Published to.
Optional: enable Estimate Results First to get a count before returning article rows.
Set Max results for the number of records to return in this run.
Run the Actor.
Download article records from the Dataset and read pagination details from the OUTPUT record.

Input

The Actor input form contains the full list of available countries and websites. The main fields are:

Field	Type	Required	Description
`countries`	`string[]`	No	Search one or more countries. Use this or `websites`, not both.
`websites`	`string[]`	No	Search one or more news websites or publishers. Use this or `countries`, not both.
`published_from`	`string`	No	Return articles published on or after this date. Supports ISO dates and relative values.
`published_to`	`string`	No	Return articles published on or before this date. Supports ISO dates and relative values.
`estimate_only`	`boolean`	No	Estimate how many matching article records are available without returning dataset rows.
`max_results`	`integer`	No	Maximum number of article records to return in this run. Default is `10`; maximum is `5000`.
`continuation_token`	`string`	No	Token from a previous run used to continue from the next page of results.

Helpful defaults:

If you choose neither Countries nor Websites, the Actor searches AP News.
If you provide no dates, the Actor searches the last 10 days.
If you provide only published_from, published_to defaults to 0 days.
If you provide only published_to, the Actor derives a one-day window ending at that date.

Accepted date examples:

2025-01-01
2025-01-01T00:00:00Z
7 days
30 days
12 months
2 years

Example Inputs

Search By Country

{
  "countries": ["South Africa"],
  "published_from": "30 days",
  "published_to": "0 days",
  "max_results": 100
}

Search By Website

{
  "websites": ["AP News", "Reuters"],
  "published_from": "2025-01-01",
  "published_to": "2025-12-31",
  "max_results": 100
}

Estimate Results Before Exporting

{
  "countries": ["United States"],
  "published_from": "12 months",
  "published_to": "0 days",
  "estimate_only": true,
  "max_results": 500
}

Continue A Large Export

{
  "countries": ["Africa"],
  "published_from": "2 years",
  "published_to": "0 days",
  "max_results": 100,
  "continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"
}

Output Data

When estimate_only is false, article records are pushed to the default Apify dataset. Common fields include:

Field	Type	Description
`site_name`	`string`	News website or publisher name.
`country`	`string`	Country associated with the source.
`region`	`string`	Broader region associated with the source.
`language`	`string`	Source language metadata.
`article_title`	`string`	Article headline.
`author`	`string \| null`	Author or byline when available.
`article_body`	`string`	Normalized article text.
`tags`	`string[]`	Tags or keywords when available.
`date_published`	`string`	ISO 8601 publication timestamp.
`article_url`	`string`	Canonical article URL.
`main_image_url`	`string \| null`	Featured image URL when available.
`seo_description`	`string \| null`	Article summary or meta description when available.

Example dataset item:

{
  "site_name": "AP News",
  "country": "United States",
  "region": "North America",
  "language": "en",
  "article_title": "Sample archived news headline",
  "author": null,
  "article_body": "Normalized article text appears here...",
  "tags": [],
  "date_published": "2026-05-06T16:13:00+00:00",
  "article_url": "https://example.com/news/sample-article",
  "main_image_url": null,
  "seo_description": null
}

Output Summary And Pagination

Each run also writes an OUTPUT record with summary and pagination metadata:

resultCount
hasMore
nextContinuationToken
filters
estimatedMatchCount when estimate_only is enabled
estimatedReturnedThisRun when estimate_only is enabled

If hasMore is true, run the Actor again with the same filters and pass nextContinuationToken into continuation_token.

In estimate mode:

The dataset remains empty.
The OUTPUT record includes the estimated match count.
You can rerun the same input with estimate_only: false to fetch article rows.

Python API Example

Copy the Actor ID from the Actor's API tab and use it in ACTOR_ID.

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

ACTOR_ID = "YOUR_USERNAME/Apify-The-Rise-of-the-Phoenix"

run_input = {
    "countries": ["South Africa"],
    "published_from": "30 days",
    "published_to": "0 days",
    "max_results": 25,
}

run = client.actor(ACTOR_ID).call(run_input=run_input)

dataset_items = list(
    client.dataset(run["defaultDatasetId"]).iterate_items(clean=True)
)

output_record = client.key_value_store(
    run["defaultKeyValueStoreId"]
).get_record("OUTPUT")
summary = output_record["value"] if output_record else {}

print("Run status:", run["status"])
print("Articles returned:", len(dataset_items))
print("Has more:", summary.get("hasMore"))
print("Next token:", summary.get("nextContinuationToken"))

To check volume before exporting rows, set estimate_only to true. The Actor will return the estimate in the OUTPUT record and leave the dataset empty.

Tips For Better Results

Use estimate_only before broad searches such as large countries, long date ranges, or global sources.
Use narrower date ranges when you need smaller, faster exports.
Use websites when you need publisher-specific news data.
Keep the same filters when using continuation_token; only the token should change between pages.
Increase max_results when you want fewer API calls, up to the Actor limit.

FAQ

Does this Actor scrape websites live during the run?

No. It searches the hosted article archive and returns matching article records. This makes it useful as a fast historical news API rather than a live crawling job.

Do I need to manage infrastructure to use it?

No. Run the Actor from Apify Console, tasks, schedules, or the Apify API. Provide your search filters and download the results from the dataset.

Why did I get zero dataset items?

The most common reasons are:

Your filters matched no archived articles.
Your date range is too narrow.
estimate_only was enabled.
Your continuation_token points beyond the available results.

How do I fetch more than one page of results?

Check the OUTPUT record. If hasMore is true, copy nextContinuationToken into the next run as continuation_token and keep the same country, website, and date filters.

What can I export?

You can export the dataset from Apify as JSON, CSV, Excel, XML, RSS, or HTML. Programmatic users can fetch the dataset through the Apify API.

Responsible Use

This Actor returns article data from a hosted archive. You are responsible for using the data in line with applicable laws, publisher terms, and your own compliance requirements.

Ultimate News API

glitch_404/Ultimate-News-Scraper

Scrape up to 10000 news articles from over 4500 news sources in less than 20 minutes, news from over 20 categories, e.g., Crypto news, World News, Latest News, Celebrities, and a lot more. You can find news on websites such as Fox News, BBC News, CNN, and Cryptocurrency-Related News Sources.

Yousif Wael

252

1.0

Free Google News API — Search News by Keyword + Country

s-r/google-news

Free Google News scraper — get clean structured news results for any query, country, and language. Use it as a Google News API for brand monitoring, topic alerts, news clipping, and bulk article URL harvesting.

News Archive Scraper

fortuitous_pirate/news-archive-scraper

News Archive Aggregator. Structured data export for lead generation, enrichment, and competitive research.

Fortuitous Pirate

Google News Scraper

fortuitous_pirate/google-news-scraper

Scrape news articles from Google News by search query or topic. Extracts article title, source, published date, and URL. Supports language and country filtering. Export to JSON, CSV, or Excel.

Fortuitous Pirate

Bing News Scraper

piotrv1001/bing-news-scraper

Scrapes news articles from Bing News search results, extracting titles, URLs, sources, publication dates, descriptions, and thumbnails. Ideal for media monitoring, trend analysis, and news aggregation.

FalconScrape

Google News Scraper

piotrv1001/google-news-scraper

Scrapes news articles from Google News, extracting titles, sources, publication dates, and links. Search by keywords, browse by topic, or get top headlines with multi-language and region support. Ideal for news monitoring, media analysis, and content aggregation.

FalconScrape

Google News Scraper

muscular_quadruplet/google-news-scraper

Scrape Google News articles by keyword or topic. Get headlines, sources, publish dates, snippets. Monitor news mentions, track industry trends, build news aggregators. Real-time news scraping.

Do It

Google News PPR

devisty/google-news-ppr

Provide real-time news and articles sourced from Google News (Pay per result)

Devisty

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Xtech

398

4.8

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. 📰🔍

EasyApi

1.7K

3.8

Ultimate News Scraper - Rise of the Phoenix

The Rise of the Phoenix - Historical News Archive API

What You Can Do

Common Use Cases

How To Run The Actor

Input

Example Inputs

Search By Country

Search By Website

Estimate Results Before Exporting

Continue A Large Export

Output Data

Output Summary And Pagination

Python API Example

Tips For Better Results

FAQ

Does this Actor scrape websites live during the run?

Do I need to manage infrastructure to use it?

Why did I get zero dataset items?

How do I fetch more than one page of results?

What can I export?

Responsible Use

You might also like

Ultimate News API

Free Google News API — Search News by Keyword + Country

News Archive Scraper

Google News Scraper

Bing News Scraper

Google News Scraper

Google News Scraper

Google News PPR

News Website Crawler & Article Extractor

Google News Scraper