News Scraper — Monitor News Articles & Headlines avatar

News Scraper — Monitor News Articles & Headlines

Pricing

$12.00 / 1,000 results

Go to Apify Store
News Scraper — Monitor News Articles & Headlines

News Scraper — Monitor News Articles & Headlines

📰 Scrape news articles from top publishers worldwide — extract headlines, full text, authors, publish dates, and featured images. Monitor breaking news, track industry coverage, and feed content aggregation platforms. Supports keyword search, domain filtering, and multi-language extraction

Pricing

$12.00 / 1,000 results

Rating

0.0

(0)

Developer

Luan M.

Luan M.

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 hours ago

Last modified

Categories

Share

News & Monitoring Scraper — Apify Actor

Scrapes news articles from specified sources using Playwright (headless browser) and Crawlee, with content extraction via Mozilla Readability. Supports keyword filtering, deduplication, date range filtering, and optional full-text extraction.

Features

  • Playwright-powered — handles JavaScript-rendered pages
  • Smart extraction — article title, author, date, category, content summary, and optional full text
  • Keyword filtering — only keep articles matching one or more keywords
  • Date range filtering — restrict by dateFrom / dateTo
  • Deduplication — URL-based dedup within a single run
  • Keyword auto-extraction — extracts relevant keywords from article content
  • Proxy support — integrates with Apify proxy for reliable scraping
  • Multiple output formats — JSON, CSV, or JSON array

Input

FieldTypeRequiredDefaultDescription
startUrlsarrayOne or more news article / index URLs to scrape
maxArticlesinteger100Max articles to scrape (0 = unlimited)
proxyConfigurationobjectApify proxy onProxy settings
keywordsarray[]Case-insensitive keyword whitelist (empty = keep all)
sourcesarray[]Descriptive source names (defaults to domain)
dateFromstring""ISO date filter (e.g. "2025-01-01")
dateTostring""ISO date filter (e.g. "2025-12-31")
includeFullTextbooleanfalseExtract full article text via Readability
outputFormatstring"json"json, csv, or jsonArray

Output

Each dataset item contains:

FieldTypeDescription
titlestringArticle title
urlstringSource URL
publishedDatestring (ISO) | nullDetected publish date
authorstring | nullDetected author
contentSummarystringFirst 500 chars or meta description
fullTextstring | nullFull article text (if includeFullText enabled)
sourcestringSource name or domain
categorystring | nullDetected category/section
keywordsarrayAuto-extracted keywords
scrapedAtstring (ISO)Timestamp of scrape

Usage

Local development

npm install
npm start

Environment variables

  • APIFY_TOKEN — Apify API token (for proxy & dataset access)

Technical stack

  • Node.js (ESM)
  • Crawlee — crawling & request management
  • Playwright — headless browser
  • @mozilla/readability — article extraction
  • jsdom — DOM parsing for metadata
  • Apify SDK — proxy, dataset, platform integration

License

Apache-2.0