Ultimate News Scraper - Rise of the Phoenix
Pricing
from $1.50 / 1,000 results
Ultimate News Scraper - Rise of the Phoenix
Search a news archive by country, website, and publication date. Estimate result counts, fetch paginated historical articles, and export clean news datasets without running a live scrape.
Pricing
from $1.50 / 1,000 results
Rating
5.0
(1)
Developer
Inus Grobler
Maintained by CommunityActor stats
2
Bookmarked
3
Total users
2
Monthly active users
7 days ago
Last modified
Categories
Share
Global News Archive Search
Search historical news articles from a Supabase-powered news archive by country, website, and published date. This Apify Actor is built for fast article retrieval, result estimation, and cursor-based pagination without running a live scrape during the Actor run.
What this Actor does
- Searches archived news articles already stored in Supabase
- Filters results by
countriesorwebsites - Filters results by
published_fromandpublished_to - Supports
estimate_onlyso you can check result size before fetching rows - Returns article data to the default Apify dataset
- Supports continuation tokens for paging through large result sets
What this Actor does not do
- It does not scrape websites live during the run
- It does not open browsers or crawl pages on demand
- It only returns data that already exists in the underlying archive
Best use cases
- News monitoring
- Media intelligence
- Historical article lookup
- Research workflows
- Data enrichment pipelines
- Country-level or source-level article exports
Quick start
- Choose
countriesorwebsites - Set your date range
- Turn on
estimate_onlyif you want a count first - Run the Actor
- Read rows from the dataset and paging info from
OUTPUT
Simple input examples
Search by country
{"countries": ["Africa"],"published_from": "1000 days","published_to": "0 days","max_results": 100}
Search by website
{"websites": ["Reuters", "AP News"],"published_from": "2025-01-01","published_to": "2025-12-31","max_results": 100}
Estimate results first
{"countries": ["United States"],"published_from": "30 days","published_to": "0 days","estimate_only": true,"max_results": 500}
Continue to the next page
{"countries": ["Africa"],"published_from": "1000 days","published_to": "0 days","max_results": 100,"continuation_token": "{\"date_published\":\"2026-05-06T15:44:00+00:00\",\"url_hash\":\"81a489c65af24950956dd717c2f7b4be\"}"}
Input guide
| Field | Type | Required | How it works |
|---|---|---|---|
countries | string[] | No | Search by one or more countries. Use this or websites, not both. |
websites | string[] | No | Search by one or more source names. Use this or countries, not both. |
published_from | string | No | Start of the date range. Supports ISO dates like 2025-01-01 and relative dates like 30 days. |
published_to | string | No | End of the date range. Supports ISO dates like 2025-12-31 and relative dates like 0 days. |
estimate_only | boolean | No | If true, the Actor returns a count estimate and no article rows. |
max_results | integer | No | Maximum number of rows to return. Default is 10. Maximum is 5000. |
continuation_token | string | No | Use the token from the previous run to fetch the next page. |
Helpful defaults
- If you provide neither
countriesnorwebsites, the Actor defaults toAP News - If you provide no dates, the Actor defaults to the last
10 days - If you provide only
published_from,published_todefaults to0 days - If you provide only
published_to,published_fromis derived automatically
Date format examples
2025-01-012025-01-01T00:00:00Z7 days30 days12 months2 years
Output
Article rows are pushed to the default Apify dataset.
Common dataset fields:
site_namecountryregionlanguagearticle_titleauthorarticle_bodytagsdate_publishedarticle_urlmain_image_urlseo_description
Example dataset item:
{"site_name": "Africa News","country": "Africa","region": "Western Africa | Eastern Africa | Southern Africa | Middle Africa | Northern Africa","language": "en|fr","article_title": "Trump hosts Dutch royals at the White House for dinner and overnight stay | Africanews","author": null,"article_body": "Normalized article text...","tags": [],"date_published": "2026-05-06T16:13:00+00:00","article_url": "https://www.euronews.com/2026/04/14/trump-hosts-dutch-royals-at-the-white-house-for-dinner-and-overnight-stay","main_image_url": null,"seo_description": null}
OUTPUT record
Each run also writes a lightweight OUTPUT record with summary metadata.
Typical fields:
resultCounthasMorenextContinuationTokenfiltersestimatedMatchCountin estimate modeestimatedReturnedThisRunin estimate mode
Estimate mode
Use estimate_only: true when you want to see how many articles match before pulling rows.
In estimate mode:
- no dataset rows are returned
- the
OUTPUTrecord includes the estimated match count - you can rerun the same input with
estimate_only: falseto fetch rows
Pagination
When hasMore is true, the OUTPUT record includes nextContinuationToken.
To fetch the next page:
- Copy
nextContinuationTokenfromOUTPUT - Use the same filters again
- Paste the token into
continuation_token - Run the Actor again
Why you might get zero results
- The archive does not currently contain matching rows
- The country or website filter is too narrow
- The date range is too small
- You requested a page beyond the available results
Production notes
- The Actor is optimized for archive retrieval, not live crawling
- Default run memory is
128 MB - Results are returned newest first
- The Actor is suitable for production use when your Supabase archive is populated and
DATABASE_URLis configured correctly