Pricing

$3.00 / 1,000 result_sets

article-scrapper

A flexible and powerful Apify Actor for scraping articles from tech news websites. This scraper can work with any tech news site - either from predefined presets or custom URLs

Pricing

$3.00 / 1,000 result_sets

Rating

0.0

(0)

Developer

RK K

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Tech News Article Scraper

A powerful Apify Actor for scraping articles from 14 popular tech news sources or any custom URL. Extracts full article content, metadata, sentiment, word count, and reading time. Supports RSS feeds for fast and reliable scraping.

Features

14 Preset Sources — The Verge, TechCrunch, Wired, Hacker News, BBC Tech, MIT Technology Review, VentureBeat, Dev.to, Product Hunt, and more
RSS Feed Support — Faster and more reliable than HTML scraping; automatically used for all preset sources
Custom URLs — Scrape any news site by providing your own URLs
Smart Extraction — Automatically detects titles, authors, dates, content, images, and tags
Sentiment Analysis — Labels each article as positive, neutral, or negative with a confidence score
Date Range Filtering — Filter by date range or relative window (e.g. last 7 days)
Flexible Filters — Filter by keywords, title text, or description with AND/OR logic
Word Count & Reading Time — Automatically calculated for every article
Robust Error Handling — Retry logic, graceful fallbacks, and detailed logging

Preset Sources

Key	Source	Focus
`verge`	The Verge	Tech news and media
`techcrunch`	TechCrunch	Startups and technology
`wired`	Wired	Technology and culture
`cnet`	CNET	Product reviews and news
`arstechnica`	Ars Technica	In-depth tech analysis
`engadget`	Engadget	Consumer electronics
`theguardian-tech`	The Guardian Tech	Tech news from The Guardian
`thenextweb`	The Next Web	International tech news
`hackernews`	Hacker News	Developer and startup community
`bbc-tech`	BBC Technology	Global tech news
`mit-tech-review`	MIT Technology Review	Deep tech and AI research
`venturebeat`	VentureBeat	AI, startups, and enterprise tech
`devto`	Dev.to	Developer articles and tutorials
`producthunt`	Product Hunt	Daily startup and product launches

Input Parameters

Parameter	Type	Default	Description
`usePresets`	boolean	`true`	Use preset sources or custom URLs
`presetSources`	array	`["verge", "techcrunch", "wired"]`	List of preset source keys
`customUrls`	array	`[]`	Custom URLs to scrape (when `usePresets` is false)
`maxArticlesPerSource`	integer	`10`	Max articles per source (1–100)
`maxPages`	integer	`1`	Max listing pages to check per source (1–10)
`includeContent`	boolean	`true`	Extract full article text
`useRssFeeds`	boolean	`true`	Use RSS feeds for preset sources (faster, more reliable)
`includeSentiment`	boolean	`true`	Add sentiment analysis to each article
`sentimentFilter`	string	`"all"`	Only return articles with this sentiment: `all`, `positive`, `neutral`, `negative`
`filterMode`	string	`"any"`	`any` = article passes if it matches at least one filter (OR). `all` = article must pass every filter (AND)
`searchKeywords`	array	`[]`	Filter articles containing any of these keywords in title or summary
`titleContains`	string	`""`	Only include articles with this text in the title
`descriptionContains`	string	`""`	Only include articles with this text in the summary or content
`dateFrom`	string	`""`	Only include articles published on or after this date (YYYY-MM-DD)
`dateTo`	string	`""`	Only include articles published on or before this date (YYYY-MM-DD)
`publishedWithin`	string	`""`	Only include articles published within this window. Examples: `24h`, `7d`, `30d`, `2w`

Output Format

Each article is saved with the following fields:

{
  "title": "OpenAI Releases GPT-5",
  "url": "https://techcrunch.com/2025/01/15/openai-gpt5",
  "author": "Jane Doe",
  "published_date": "2025-01-15T10:30:00+00:00",
  "content": "Full article text...",
  "summary": "OpenAI today announced GPT-5, its most capable model yet...",
  "source": "TechCrunch",
  "tags": ["AI", "OpenAI", "GPT"],
  "image_url": "https://techcrunch.com/images/gpt5.jpg",
  "word_count": 847,
  "reading_time_minutes": 4,
  "sentiment": {
    "label": "positive",
    "score": 0.6249
  },
  "scraped_at": "2025-01-15T11:00:00+00:00"
}

sentiment is only present when includeSentiment is true.

Usage Examples

1. Morning AI News Briefing

Get the latest AI articles from the last 24 hours across major sources:

{
  "usePresets": true,
  "presetSources": ["techcrunch", "verge", "wired", "mit-tech-review"],
  "maxArticlesPerSource": 20,
  "publishedWithin": "24h",
  "searchKeywords": ["AI", "artificial intelligence", "ChatGPT", "LLM"],
  "filterMode": "any",
  "useRssFeeds": true,
  "includeSentiment": true
}

2. Brand Monitoring — Negative Press Only

Track negative coverage about a topic:

{
  "usePresets": true,
  "presetSources": ["techcrunch", "verge", "arstechnica", "bbc-tech"],
  "maxArticlesPerSource": 30,
  "searchKeywords": ["Apple", "iPhone"],
  "filterMode": "any",
  "includeSentiment": true,
  "sentimentFilter": "negative",
  "publishedWithin": "7d"
}

3. Startup Launch Tracker

Track new product launches and funding news:

{
  "usePresets": true,
  "presetSources": ["producthunt", "techcrunch", "venturebeat"],
  "maxArticlesPerSource": 20,
  "searchKeywords": ["launch", "funding", "raises", "Series A"],
  "filterMode": "any",
  "dateFrom": "2025-01-01"
}

4. Developer Community Digest

Pull community articles from Dev.to and Hacker News:

{
  "usePresets": true,
  "presetSources": ["devto", "hackernews"],
  "maxArticlesPerSource": 50,
  "includeContent": true,
  "useRssFeeds": true
}

5. Custom Site Scraping

Scrape any site not in the preset list:

{
  "usePresets": false,
  "customUrls": [
    "https://9to5mac.com",
    "https://9to5google.com"
  ],
  "maxArticlesPerSource": 15,
  "maxPages": 2,
  "includeContent": true
}

6. Long-form Articles Only

Use word count to filter out short posts and stubs:

{
  "usePresets": true,
  "presetSources": ["wired", "arstechnica", "mit-tech-review"],
  "maxArticlesPerSource": 20,
  "includeContent": true
}

Tip: After export, filter word_count > 800 in your spreadsheet to get long-form articles only.

Filtering Guide

Three filter types are available — filterMode controls how they combine:

filterMode	Behaviour
`any` (recommended)	Article passes if it matches at least one active filter
`all`	Article must pass every active filter simultaneously

Example with filterMode: "any":

searchKeywords: ["AI"] matches → article included, regardless of other filters

Example with filterMode: "all":

Article must match searchKeywords AND titleContains AND descriptionContains — very strict

Viewing Results

Dataset Tab — View all articles in a table after the run completes
Export — Download as JSON, CSV, XML, or Excel
API — Access results programmatically via the Apify API
Schedule — Set up automated daily or weekly runs via Apify Scheduler

Troubleshooting

Getting 0 articles with multiple filters set

Switch filterMode to "any" — AND logic ("all") is very strict
Increase maxArticlesPerSource so more articles are available to filter
Check the logs — each rejected article shows which filter removed it

Missing content or short word counts

RSS feeds provide summaries, not full articles — set useRssFeeds: false for full HTML scraping
Some sites have unique page structures the generic scraper may miss

Connection timeouts or blocked requests

Reduce maxArticlesPerSource to avoid rate limiting
Some sites block automated requests — try a different source

JavaScript-heavy sites not working

This scraper uses static HTML parsing — JS-rendered pages are not supported
Use RSS mode (useRssFeeds: true) for preset sources to avoid this entirely

FAQ

Can I scrape paywalled content? No — only publicly accessible content is scraped.

How fast is it? RSS mode: ~10 articles per second. HTML mode: ~1–2 articles per second due to polite rate limiting.

Can I add my own sources permanently? Use customUrls for one-off scraping, or open a GitHub issue to request a new preset source.

Is this legal? Web scraping legality depends on the site's terms of service and your jurisdiction. Always check robots.txt, avoid overloading servers, and use data responsibly. This tool is for legitimate use cases only.

License

Provided as-is for educational and legitimate scraping purposes. Always respect website terms of service and robots.txt files.

Tech News Article Scraper

inquisitive_sarangi/news-article-scraper

Tech News Article Scraper is a simple yet powerful tool to extract news articles from a variety of popular news websites. Supported The Verge, CNET, Wired, TechCrunch, Ars Technica, Tech Radar, Engadget

API Master

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. 📰🔍

EasyApi

2.1K

3.9

Tech News Aggregator

variable_nose_u5u/technews-aggregator

Aggregate and scrape tech news articles from multiple sources (Hacker News, TechCrunch, The Verge, Ars Technica, etc.). Get article titles, URLs, points, comments, and full content in one run.

Cyril R

Google News Scraper - Low-cost💲🔥📰🤖

delectable_incubator/google-news-scraper-low-cost

Scrape Google News articles 📰🔍 with a powerful news intelligence scraper. Extract titles, sources, publication times, images, snippets & direct article URLs from keyword-based searches. Ideal for media monitoring, trend analysis, SEO tracking, competitive intelligence & real-time news insights 🚀

Prime Scrape

5.0

✅ CHEAP GOOGLE NEWS SCRAPPER ✅

shoya/cheap-google-news-scrapper

Extract news articles from Google News with unlimited keywords, custom location, language, and time period filters. Supports advanced search operators, topic-based scraping, and automatic deduplication. One of the most affordable Google News scrapers on Apify optimized for speed and cost efficiency.

Shoya

5.0

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Xtech

407

4.8

Ultimate News API

glitch_404/Ultimate-News-Scraper

Scrape up to 10000 news articles from over 4500 news sources in less than 20 minutes, news from over 20 categories, e.g., Crypto news, World News, Latest News, Celebrities, and a lot more. You can find news on websites such as Fox News, BBC News, CNN, and Cryptocurrency-Related News Sources.

Yousif Wael

258

1.0

Google News Scraper

parseforge/google-news-scraper

Monitor the news automatically with our Google News scraper. Track articles by keyword or topic with flexible date filtering and multi language support. Access structured data including headlines, publishers, links, and more. Built for teams that need reliable news insights without manual work.

ParseForge

Google News Scraper — Headlines, Articles & News Data

oneary/google-news-scraper

Extract the latest Google News articles by keyword. Get headlines, publishers, snippets, publish dates, and article URLs. Perfect for media monitoring, news aggregation, and trend tracking.