Pricing

from $4.50 / 1,000 results

Rise of the Phoenix: Website Scraper

Powerful Apify news scraper for real-time and historical article extraction across 800+ global publishers. Built with smart fallback crawling (Scrapling, PyDoll, Selenium), category targeting, proxy support, and clean JSON output with error analytics for reliable, scalable intelligence pipelines.

Pricing

from $4.50 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

The Rise of the Phoenix - Apify News Scraper

A high-scale Apify news scraper for real-time and historical news extraction across a large global publisher catalog.

Built for production data pipelines, monitoring, intelligence, media analysis, and research workflows.

Why This Actor

Scrapes current and historic articles from hundreds of tracked news sites
Uses resilient fallback fetching: scrapling -> pydoll -> selenium
Supports targeted site/category runs or broad catalog runs
Returns structured article output plus structured scrape error telemetry
Works with Apify Proxy for difficult sites

Apify Input Reference

Use these exact input keys in your Apify run:

Input field	Type	Required	Description
`sites_to_scrape`	`array[string]`	No	Select one or more active catalog sites. Default run target is `["AP News"]`.
`categories_to_scrape`	`array[string]`	No	Manual category override values in format `Site Name
`execution_mode`	`string`	Yes	`current` or `historic`.
`historic_cutoff_date`	`string`	Required in historic mode	ISO timestamp cutoff, e.g. `2025-01-01T00:00:00Z`.
`max_items_per_site`	`integer`	No	Per-site cap when `no_items_limit` is `false` (default `10`).
`no_items_limit`	`boolean`	No	If `true`, ignores `max_items_per_site`.
`proxy_config`	`object`	No	Apify proxy or custom proxy URLs.
`site_category_filters`	`array[object]`	No	Advanced legacy override for explicit site-to-category mapping.

Input Examples

1) Current scraping (selected sites)

{
  "sites_to_scrape": ["Reuters", "Gulf News"],
  "execution_mode": "current",
  "max_items_per_site": 50,
  "no_items_limit": false,
  "proxy_config": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}

2) Historic scraping (with cutoff)

{
  "sites_to_scrape": ["The Punch"],
  "execution_mode": "historic",
  "historic_cutoff_date": "2025-01-01T00:00:00Z",
  "no_items_limit": true,
  "proxy_config": {
    "useApifyProxy": true
  }
}

3) Category-targeted scraping

{
  "sites_to_scrape": ["Gulf News", "Reuters"],
  "categories_to_scrape": [
    "Gulf News|||https://gulfnews.com/business",
    "Reuters|||https://www.reuters.com/world/"
  ],
  "execution_mode": "current",
  "max_items_per_site": 100
}

Output

This Actor writes:

Default dataset: successful article records
Named dataset error-log: failed URLs, tool fallback diagnostics, and extraction errors
Key-value store OUTPUT: run summary (successItemCount, errorItemCount, mode, and site scope)
Apify Output tab links: configured via .actor/output_schema.json for quick access to dataset items and run summary

Best Practices

Use execution_mode: "current" for daily monitoring and near-real-time ingestion.
Use execution_mode: "historic" with historic_cutoff_date for backfills.
Use categories_to_scrape for precise topical runs without editing catalog files.
Keep proxy_config.useApifyProxy enabled for better stability on protected domains.

Keywords

Apify news scraper, historical news scraping, web scraping API, article extraction, media monitoring, dataset automation, scalable scraping pipeline.

Google News Scraper

futurizerush/google-news-scraper

Google News Search Scraper - Real-time news aggregation from Google News. Features smart article enrichment with full content extraction. Perfect for market research, trend analysis, and content monitoring.

Rush

5.0

Google News Scraper

scraper-engine/google-news-scraper

Scrape Google News results in real time for any topic, keyword, or region. Collect headlines, publishers, timestamps, article summaries, and links. Ideal for monitoring trends, tracking competitors, and gathering research data. Export clean structured output in JSON, CSV, or Excel.

Scraper Engine

5.0

Ferebze Reviews Scraper

getdataforme/ferebze-reviews-scraper

Extract detailed Febreze customer reviews, ratings, and metadata from product pages with this reliable Apify Actor. Features include proxy support for uninterrupted scraping, scalable crawling, and clean JSON output. Ideal for market research, competitive intelligence, and data-driven decisions....

GetDataForMe

Google News Scraper

parseforge/google-news-scraper

Monitor the news automatically with our Google News scraper. Track articles by keyword or topic with flexible date filtering and multi language support. Access structured data including headlines, publishers, links, and more. Built for teams that need reliable news insights without manual work.

ParseForge

Google News Scraper

simpleapi/google-news-scraper

Extract the latest Google News stories with full metadata and precise keyword filtering. Build datasets of headlines, publishers, and time-based insights. Ideal for media monitoring, academic research, and real-time intelligence dashboards.

SimpleAPI

Google News Scraper

crawlerbros/google-news-scraper

Scrape Google News in real-time. Supports keyword search, date filters, full-text article extraction, and image extraction.

Crawler Bros

5.0

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

WebScrap

Google News Scraper

codingfrontend/google-news-scraper

Scrape news articles from news.google.com with deep article content extraction

codingfrontend

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

Data Pilot

Acehardware Profile Scraper

getdataforme/acehardware-profile-scraper

Extract detailed product data from Ace Hardware's website, including descriptions, pricing, specs, and images. Features scalable crawling, error-resilient performance, and structured JSON output with timestamps. Ideal for market research, price monitoring, and competitive analysis....

GetDataForMe